Milestone 1¶

Context:¶

PROJECT BACKGROUND: There is a huge demand for used cars in the Indian Market today. As sales of new cars have slowed down in the recent past, the pre-owned car market has continued to grow over the past few years and is now larger than the new car market. Cars4U is a budding tech start-up that aims to find footholes in this market.

In 2018-19, while new car sales were recorded at 3.6 million units, around 4 million second-hand cars were bought and sold. There is a slowdown in new car sales and that could mean that the demand is shifting towards the pre-owned market. In fact, some car owners replace their old vehicles with pre-owned cars instead of buying a new automobile.

Unlike new cars, where price and supply are fairly deterministic and managed by OEMs (Original Equipment Manufacturer / except for dealership level discounts which come into play only in the last stage of the customer journey), the used car market is a very different beast, with large uncertainties in both pricing and supply. Several factors, including mileage, brand, model, year, etc. can influence the actual worth of a car. From the perspective of a seller, it is not an easy task to set the correct price of a used car. Keeping this in mind, the pricing scheme of these used cars becomes important in order to grow in the market.

BUSINESS CONTEXT: Cars4U, a tech startup in India, want to sell software to used car dealerships in a B2B business model. They would like to also target a larger market, the end customer in a B2C model where end users could use the price predictor model to negotiate with car sellers (a fremium model with upsell for more functionality; advertisement revenue) - but this is their future market strategy. Firstly, their initial target market will be used car sellers where the value proposition is to provide a more accurate understanding of the price of a used car beyond the traditional make, model, year. Based on this information, the dealership can decide to add a premium on top of the given price to the customer; they can also opt to reject any car that they get from auction if the price is too high. For this, they are using historical data that contains other variables such as number of previous owner, the transmission types, etc. (see Data Dictionary). They intend to use this information to create a machine learning model to first prove their hypothesis that other factors do indeed play a role in the price of a used car. They can then productize the model in the form of an app that can be deployed and that will provide an easy to use interface for query and results. The app itself may have other features such as history of similar cars sold in the region, etc. but first they have to see if their hunch is correct.

The objective:¶

Determine an accurate pricing model that will effectively predict the selling price of used cars in order to enable Cars4U to sell software that will allow its customers to devise profitable revenue strategies using differential pricing in the Indian market.

The key questions:¶

  1. The most important question is related to determining what are the key factors affecting the selling price of a used car. We need to examine every independent variable and make this determination. In the case where domain knowledge is required, we will use Google.

  2. How confident are we in our findings?

  3. Can we justify our assumptions?

  4. Can we justify our findings?

  5. How will we ensure the accuracy of our chosen model?

The problem formulation:¶

Create an accuracte and defensible supervised machine learning model that will provide an accurate selling price for a used car based on the features provided in the data dictionary.

Data Dictionary¶

S.No. : Serial Number

Name : Name of the car which includes Brand name and Model name

Location : The location in which the car is being sold or is available for purchase (Cities)

Year : Manufacturing year of the car

Kilometers_driven : The total kilometers driven in the car by the previous owner(s) in KM

Fuel_Type : The type of fuel used by the car (Petrol, Diesel, Electric, CNG, LPG)

Transmission : The type of transmission used by the car (Automatic / Manual)

Owner : Type of ownership

Mileage : The standard mileage offered by the car company in kmpl or km/kg

Engine : The displacement volume of the engine in CC

Power : The maximum power of the engine in bhp

Seats : The number of seats in the car

New_Price : The price of a new car of the same model in INR 100,000

Price : The price of the used car in INR 100,000 (Target Variable)

Important Notes¶

  • This notebook can be considered a guide to refer to while solving the problem. The evaluation will be as per the Rubric shared for each Milestone. Unlike previous courses, it does not follow the pattern of the graded questions in different sections. This notebook will give you a direction on what steps need to be taken in order to get a viable solution to the problem. Please note that this is just one way of doing this. There can be other 'creative' ways to solve the problem and we urge you to feel free and explore them as an 'optional' exercise.

  • In the notebook, there are markdown cells called - Observations and Insights. It is a good practice to provide observations and extract insights from the outputs.

  • The naming convention for different variables can vary. Please consider the code provided in this notebook as a sample code.

  • All the outputs in the notebook are just for reference and can be different if you follow a different approach.

  • There are sections called Think About It in the notebook that will help you get a better understanding of the reasoning behind a particular technique/step. Interested learners can take alternative approaches if they wish to explore different techniques.

Milestone 1¶

Loading libraries¶

In [1]:
# Import libraries for data manipulation
import pandas as pd
import numpy as np

# Import libraries for data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from statsmodels.graphics.gofplots import ProbPlot

# Import libraries for building linear regression model
from statsmodels.formula.api import ols
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

# Import library for preparing data
from sklearn.model_selection import train_test_split

# Import library for data preprocessing
from sklearn.preprocessing import MinMaxScaler

# To ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Remove the limit from the number of displayed columns and rows. It helps to see the entire dataframe while printing it
pd.set_option("display.max_columns", None)

# To get visualization on missing values
#!pip install missingno
import missingno as msno

Let us load the data¶

In [2]:
df = pd.read_csv("used_cars.csv")

Understand the data by observing a few rows¶

In [3]:
df.head()
Out[3]:
S.No. Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price
0 0 Maruti Wagon R LXI CNG Mumbai 2010 72000 CNG Manual First 26.60 998.0 58.16 5.0 NaN 1.75
1 1 Hyundai Creta 1.6 CRDi SX Option Pune 2015 41000 Diesel Manual First 19.67 1582.0 126.20 5.0 NaN 12.50
2 2 Honda Jazz V Chennai 2011 46000 Petrol Manual First 18.20 1199.0 88.70 5.0 8.61 4.50
3 3 Maruti Ertiga VDI Chennai 2012 87000 Diesel Manual First 20.77 1248.0 88.76 7.0 NaN 6.00
4 4 Audi A4 New 2.0 TDI Multitronic Coimbatore 2013 40670 Diesel Automatic Second 15.20 1968.0 140.80 5.0 NaN 17.74
In [4]:
df.tail()
Out[4]:
S.No. Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price
7248 7248 Volkswagen Vento Diesel Trendline Hyderabad 2011 89411 Diesel Manual First 20.54 1598.0 103.6 5.0 NaN NaN
7249 7249 Volkswagen Polo GT TSI Mumbai 2015 59000 Petrol Automatic First 17.21 1197.0 103.6 5.0 NaN NaN
7250 7250 Nissan Micra Diesel XV Kolkata 2012 28000 Diesel Manual First 23.08 1461.0 63.1 5.0 NaN NaN
7251 7251 Volkswagen Polo GT TSI Pune 2013 52262 Petrol Automatic Third 17.20 1197.0 103.6 5.0 NaN NaN
7252 7252 Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan... Kochi 2014 72443 Diesel Automatic First 10.00 2148.0 170.0 5.0 NaN NaN

Observations and Insights: _

The price of the car indicated by the variable Price and is the target variable. The rest of the variables are independent variables on which we will predict the price of the car. There are NaNs in the database, especially in the New_price and Price columns. I also understand what Owner Type means now: whether this is the first, second, third, etc owner of the car. This is actually valuable information. For example, if a car has changed hands several times during a short period of time, it may indicate a problem with the car. Serial number seems to be a unique identifier for each car, but since we are not looking for indiviual cars but categories, this column may be unnecessary - to be determined.

Let us check the data types and and missing values of each column¶

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7253 entries, 0 to 7252
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   S.No.              7253 non-null   int64  
 1   Name               7253 non-null   object 
 2   Location           7253 non-null   object 
 3   Year               7253 non-null   int64  
 4   Kilometers_Driven  7253 non-null   int64  
 5   Fuel_Type          7253 non-null   object 
 6   Transmission       7253 non-null   object 
 7   Owner_Type         7253 non-null   object 
 8   Mileage            7251 non-null   float64
 9   Engine             7207 non-null   float64
 10  Power              7078 non-null   float64
 11  Seats              7200 non-null   float64
 12  New_price          1006 non-null   float64
 13  Price              6019 non-null   float64
dtypes: float64(6), int64(3), object(5)
memory usage: 793.4+ KB
In [6]:
#count of unique values in features
df.nunique()
Out[6]:
S.No.                7253
Name                 2041
Location               11
Year                   23
Kilometers_Driven    3660
Fuel_Type               5
Transmission            2
Owner_Type              4
Mileage               438
Engine                150
Power                 383
Seats                   8
New_price             625
Price                1373
dtype: int64
In [7]:
# Check total number of missing values of each column. Hint: Use isnull() method
df.isnull().sum()
Out[7]:
S.No.                   0
Name                    0
Location                0
Year                    0
Kilometers_Driven       0
Fuel_Type               0
Transmission            0
Owner_Type              0
Mileage                 2
Engine                 46
Power                 175
Seats                  53
New_price            6247
Price                1234
dtype: int64

Observations and Insights: _

We can observe that S.No. has no null values. Also the number of unique values are equal to the number of observations. So, S.No. looks like an index for the data entry and such a column would not be useful in providing any predictive power for our analysis. Hence, it can be dropped. The 2 price columns have nulls.

We can also observe that there are a lot of missing values of New_price. Price is high as well. Engine, Power and Seats also have missing values, but it's not as bad.

In [8]:
# Remove S.No. column from data. Hint: Use inplace = True

df.drop(columns=['S.No.'], inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7253 entries, 0 to 7252
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Name               7253 non-null   object 
 1   Location           7253 non-null   object 
 2   Year               7253 non-null   int64  
 3   Kilometers_Driven  7253 non-null   int64  
 4   Fuel_Type          7253 non-null   object 
 5   Transmission       7253 non-null   object 
 6   Owner_Type         7253 non-null   object 
 7   Mileage            7251 non-null   float64
 8   Engine             7207 non-null   float64
 9   Power              7078 non-null   float64
 10  Seats              7200 non-null   float64
 11  New_price          1006 non-null   float64
 12  Price              6019 non-null   float64
dtypes: float64(6), int64(2), object(5)
memory usage: 736.8+ KB

Exploratory Data Analysis¶

Let us now explore the summary statistics of numerical variables¶

In [9]:
df.describe().T
Out[9]:
count mean std min 25% 50% 75% max
Year 7253.0 2013.365366 3.254421 1996.00 2011.000 2014.00 2016.0000 2019.00
Kilometers_Driven 7253.0 58699.063146 84427.720583 171.00 34000.000 53416.00 73000.0000 6500000.00
Mileage 7251.0 18.141580 4.562197 0.00 15.170 18.16 21.1000 33.54
Engine 7207.0 1616.573470 595.285137 72.00 1198.000 1493.00 1968.0000 5998.00
Power 7078.0 112.765214 53.493553 34.20 75.000 94.00 138.1000 616.00
Seats 7200.0 5.280417 0.809277 2.00 5.000 5.00 5.0000 10.00
New_price 1006.0 22.779692 27.759344 3.91 7.885 11.57 26.0425 375.00
Price 6019.0 9.479468 11.187917 0.44 3.500 5.64 9.9500 160.00

Observations and Insights: _

We can derive the following observations and insights:

  1. For 'Year' variable, the distribution is likely to be roughly symmetric and unimodal, with a peak around 2013-2014 and a long tail towards both sides (older and newer years). This indicates that the majority of cars in the dataset were manufactured around 2013-2014, with relatively fewer cars manufactured in the earlier or later years.

  2. For Kilometers_Driven, the max appears to be an outlier since it is highly unlikely that a vehicle has been driven for 6.5 million kilometers.

  3. For Mileage, the min cannot be 0.

Let us also explore the summary statistics of all categorical variables and the number of unique observations in each category¶

In [10]:
# Explore basic summary statistics of categorical variables. Hint: Use the argument include = ['object'] 
for i in df.columns:
    
    plt.figure(figsize = (7, 4))
    
    sns.histplot(data = df, x = i, kde = True)
    
    plt.show()

Number of unique observations in each category

In [11]:
cat_cols = df.select_dtypes(include = ['object']).columns

for column in cat_cols:
    
    print("For column:", column)
    
    print(df[column].value_counts())
    
    print('-'*50)
For column: Name
Mahindra XUV500 W8 2WD                  55
Maruti Swift VDI                        49
Maruti Swift Dzire VDI                  42
Honda City 1.5 S MT                     39
Maruti Swift VDI BSIV                   37
                                        ..
Chevrolet Beat LT Option                 1
Skoda Rapid 1.6 MPI AT Elegance Plus     1
Ford EcoSport 1.5 TDCi Ambiente          1
Hyundai i10 Magna 1.1 iTech SE           1
Hyundai Elite i20 Magna Plus             1
Name: Name, Length: 2041, dtype: int64
--------------------------------------------------
For column: Location
Mumbai        949
Hyderabad     876
Coimbatore    772
Kochi         772
Pune          765
Delhi         660
Kolkata       654
Chennai       591
Jaipur        499
Bangalore     440
Ahmedabad     275
Name: Location, dtype: int64
--------------------------------------------------
For column: Fuel_Type
Diesel      3852
Petrol      3325
CNG           62
LPG           12
Electric       2
Name: Fuel_Type, dtype: int64
--------------------------------------------------
For column: Transmission
Manual       5204
Automatic    2049
Name: Transmission, dtype: int64
--------------------------------------------------
For column: Owner_Type
First             5952
Second            1152
Third              137
Fourth & Above      12
Name: Owner_Type, dtype: int64
--------------------------------------------------

Observations and Insights: _¶

  1. Certain cars are more popular than others as indicated by Name. However, the most popular car had only 55 entires out of a dataset of 7253 records. Therefore, we are dealing with a large variety of cars.
  2. Bigger cities have more cars. Not unexpected. Mumbai has the most, followed by Hyderabad.
  3. Most cars are Diesel or Petrol.
  4. Manual cars are more popular than Automatic.
  5. Most cars are first owner cars, meaning that most of the cars are being sold by the owner of the new car.

Think About It:

  • We could observe from summary statistics that kilometers_driven has extreme values. Can we look at the manufactured year for cars with extreme values for kilometers_driven?
  • Also, we could observe the feature mileage has values zero. Can the mileage of a car be zero?

Let's explore the two points mentioned above

Check Kilometers_Driven extreme values

In [12]:
# Sort the dataset in 'descending' order using the feature 'Kilometers_Driven'
df.sort_values('Kilometers_Driven', ascending=False).head(10)
Out[12]:
Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price
2328 BMW X5 xDrive 30d M Sport Chennai 2017 6500000 Diesel Automatic First 15.97 2993.0 258.00 5.0 NaN 65.00
340 Skoda Octavia Ambition Plus 2.0 TDI AT Kolkata 2013 775000 Diesel Automatic First 19.30 1968.0 141.00 5.0 NaN 7.50
1860 Volkswagen Vento Diesel Highline Chennai 2013 720000 Diesel Manual First 20.54 1598.0 103.60 5.0 NaN 5.90
358 Hyundai i10 Magna 1.2 Chennai 2009 620000 Petrol Manual First 20.36 1197.0 78.90 5.0 NaN 2.70
2823 Volkswagen Jetta 2013-2015 2.0L TDI Highline AT Chennai 2015 480000 Diesel Automatic First 16.96 1968.0 138.03 5.0 NaN 13.00
3092 Honda City i VTEC SV Kolkata 2015 480000 Petrol Manual First 17.40 1497.0 117.30 5.0 NaN 5.00
4491 Hyundai i20 Magna Optional 1.2 Bangalore 2013 445000 Petrol Manual First 18.50 1197.0 82.90 5.0 NaN 4.45
6921 Maruti Swift Dzire Tour LDI Jaipur 2012 350000 Diesel Manual First 23.40 1248.0 74.00 5.0 NaN NaN
3649 Tata Indigo LS Jaipur 2008 300000 Diesel Manual First 17.00 1405.0 70.00 5.0 NaN 1.00
1528 Toyota Innova 2.5 G (Diesel) 8 Seater BS IV Hyderabad 2005 299322 Diesel Manual First 12.80 2494.0 102.00 8.0 NaN 4.00

Observations and Insights: _¶

In the first row, a car manufactured as recently as 2017 having been driven 6500000 km is almost impossible. It can be considered as data entry error and so we can remove this value/entry from data.

In [13]:
# Removing the 'row' at index 2328 from the data. Hint: use the argument inplace=True
df.drop(index=2328, inplace=True)

Check Mileage extreme values

In [14]:
# Sort the dataset in 'ascending' order using the feature 'Mileage'
df.sort_values('Mileage').head(10)
Out[14]:
Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price
2597 Hyundai Santro Xing XP Pune 2007 70000 Petrol Manual First 0.0 1086.0 NaN 5.0 NaN 1.12
2343 Hyundai Santro AT Hyderabad 2006 74483 Petrol Automatic First 0.0 999.0 NaN 5.0 NaN 2.30
5270 Honda City 1.5 GXI Bangalore 2002 53000 Petrol Manual Second 0.0 NaN NaN NaN NaN 1.85
424 Volkswagen Jetta 2007-2011 1.9 L TDI Hyderabad 2010 42021 Diesel Manual First 0.0 1968.0 NaN 5.0 NaN 5.45
6857 Land Rover Freelander 2 TD4 SE Mumbai 2011 87000 Diesel Automatic First 0.0 2179.0 115.0 5.0 NaN NaN
443 Hyundai Santro GLS I - Euro I Coimbatore 2012 50243 Petrol Manual First 0.0 1086.0 NaN 5.0 NaN 3.35
5119 Hyundai Santro Xing XP Kolkata 2008 45500 Petrol Manual Second 0.0 1086.0 NaN 5.0 NaN 1.17
5022 Land Rover Freelander 2 TD4 SE Hyderabad 2013 46000 Diesel Automatic Second 0.0 2179.0 115.0 5.0 NaN 26.00
5016 Land Rover Freelander 2 TD4 HSE Delhi 2013 72000 Diesel Automatic First 0.0 2179.0 115.0 5.0 NaN 15.50
2542 Hyundai Santro GLS II - Euro II Bangalore 2011 65000 Petrol Manual Second 0.0 NaN NaN NaN NaN 3.15

Observations¶

  • Mileage of cars can not be 0, so we should treat 0's as missing values. We will do it in the Feature Engineering part.

Univariate Analysis¶

Univariate analysis is used to explore each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It can be done for both numerical and categorical variables.

1. Univariate Analysis - Numerical Data¶

Histograms and box plots help to visualize and describe numerical data. We use box plot and histogram to analyse the numerical columns.

In [15]:
# Let us write a function that will help us create a boxplot and histogram for any input numerical variable.
# This function takes the numerical column as the input and returns the boxplots and histograms for the variable.

def histogram_boxplot(feature, figsize = (15, 10), bins = None):
    
    """ Boxplot and histogram combined
    
    feature: 1-d feature array
    
    figsize: size of fig (default (9, 8))
    
    bins: number of bins (default None / auto)
    
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid = 2
                                           sharex = True, # X-axis will be shared among all subplots
                                           gridspec_kw = {"height_ratios": (.25, .75)}, 
                                           figsize = figsize 
                                           ) # Creating the 2 subplots
    
    sns.boxplot(feature, ax = ax_box2, showmeans = True, color = 'violet') # Boxplot will be created and a symbol will indicate the mean value of the column
    
    sns.distplot(feature, kde = F, ax = ax_hist2, bins = bins, palette = "winter") if bins else sns.distplot(feature, kde = False, ax = ax_hist2) # For histogram
    
    ax_hist2.axvline(np.mean(feature), color = 'green', linestyle = '--') # Add mean to the histogram
    
    ax_hist2.axvline(np.median(feature), color = 'black', linestyle = '-') # Add median to the histogram

Let us plot histogram and box-plot for the feature 'Kilometers_Driven' to understand the distribution and outliers, if any.

In [16]:
# Plot histogram and box-plot for 'Kilometers_Driven'
histogram_boxplot(df['Kilometers_Driven'])

Think About It: Kilometers_Driven is highly right-skewed. Can we use Log transformation of the feature to reduce/remove the skewness? Why can't we keep skewed data?

Log transformation can be used to reduce the skewness of a feature, in particular when the feature is highly right-skewed. Log transformation can make the distribution more symmetric and closer to a normal distribution. The log transformation can also make the data more interpretable as it can reduce the effect of outliers.

However, the right-skewed distribution can cause issues when building models because the model may be sensitive to the outliers present in the right tail. This can lead to overfitting or poor generalization performance of the model.

Furthermore, when data is skewed, it can also lead to inaccurate estimates of model parameters and a decrease in the power of statistical tests. Also, some algorithms such as linear and logistic regression assume that the data is normally distributed, and therefore, a skewed feature can lead to biased or inefficient estimates of the model parameters.

In [17]:
# Log transformation of the feature 'Kilometers_Driven'
sns.distplot(np.log(df["Kilometers_Driven"]), axlabel = "Log(Kilometers_Driven)");

Observations and Insights: _

This is better, but now shows a slight left-skewedness.

In [18]:
# We can add a transformed kilometers_driven feature in data
df["kilometers_driven_log"] = np.log(df["Kilometers_Driven"])

Note: Like Kilometers_Driven, the distribution of Price is also highly skewed, we can use log transformation on this column to see if that helps normalize the distribution. And add the transformed variable into the dataset. You can name the variable as 'price_log'.

In [19]:
# Plot histogram and box-plot for 'Price'
histogram_boxplot(df['Price'])
In [20]:
# Log transformation of the feature 'Price'
sns.distplot(np.log(df["Price"]), axlabel = "Log(Price)")
Out[20]:
<AxesSubplot:xlabel='Log(Price)', ylabel='Density'>
In [21]:
# We can Add a transformed Price feature in data
df["price_log"] = np.log(df["Price"])

Note: Try plotting histogram and box-plot for different numerical features and understand how the data looks like.

In [22]:
# Plot histogram and box-plot for 'Mileage'
histogram_boxplot(df['Mileage'])
In [23]:
# Log transformation of the feature 'Mileage'
#sns.distplot(np.log(df["Mileage"]), axlabel = "Log(Mileage)")
#this code results in error due to zeros in the dataset for Mileage
In [24]:
# Plot histogram and box-plot for 'Engine'
histogram_boxplot(df['Engine'])
In [25]:
# Log transformation of the feature 'Engine'
sns.distplot(np.log(df["Engine"]), axlabel = "Log(Engine)");
In [26]:
# Plot histogram and box-plot for 'Power'
histogram_boxplot(df['Power'])
In [27]:
# Log transformation of the feature 'Power'
sns.distplot(np.log(df["Power"]), axlabel = "Log(Power)");
In [28]:
# Plot histogram and box-plot for 'New_price'
histogram_boxplot(df['New_price'])
In [29]:
# Log transformation of the feature 'New_price'
sns.distplot(np.log(df["New_price"]), axlabel = "Log(New_price)");
In [30]:
# We can Add a transformed New_price feature in data
df["new_price_log"] = np.log(df["New_price"])
In [31]:
# Creating histograms
df.hist(figsize = (14, 14))

plt.show()
In [32]:
df['Year'] = df['Year'].astype('object')
df['Seats'] = df['Seats'].astype('object')

# Creating histograms
df.hist(figsize = (14, 14))

df.info()

plt.show()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   7252 non-null   object 
 1   Location               7252 non-null   object 
 2   Year                   7252 non-null   object 
 3   Kilometers_Driven      7252 non-null   int64  
 4   Fuel_Type              7252 non-null   object 
 5   Transmission           7252 non-null   object 
 6   Owner_Type             7252 non-null   object 
 7   Mileage                7250 non-null   float64
 8   Engine                 7206 non-null   float64
 9   Power                  7077 non-null   float64
 10  Seats                  7199 non-null   object 
 11  New_price              1006 non-null   float64
 12  Price                  6018 non-null   float64
 13  kilometers_driven_log  7252 non-null   float64
 14  price_log              6018 non-null   float64
 15  new_price_log          1006 non-null   float64
dtypes: float64(8), int64(1), object(7)
memory usage: 963.2+ KB

Observations and Insights for all the plots: _

  1. Mileage is roughly left skewed, so we can transform. However, there are a lot of zeros which need to be cleared out first before we can do this.

  2. --> Engine is multimodal. Need to check how to handle this.

  3. -->Power is bimodal. Need to check how to handle this.

  4. -->New_price is bimodal. Need to check how to handle this.

  5. We are correct in transforming Price and Kilometers_Driven.

  6. Year and Seats are a numerical variable. It is better to convert them to object variables.

2. Univariate analysis - Categorical Data¶

In [33]:
# Let us write a function that will help us create barplots that indicate the percentage for each category.
# This function takes the categorical column as the input and returns the barplots for the variable.

def perc_on_bar(z):
    '''
    plot
    feature: categorical feature
    the function won't work if a column is passed in hue parameter
    '''

    total = len(df[z]) # Length of the column
    
    plt.figure(figsize = (15, 5))
    
    ax = sns.countplot(df[z], palette = 'Paired', order = df[z].value_counts().index)
    
    for p in ax.patches:
        
        percentage = '{:.1f}%'.format(100 * p.get_height() / total) # Percentage of each class of the category
        
        x = p.get_x() + p.get_width() / 2 - 0.05 # Width of the plot
        
        y = p.get_y() + p.get_height()           # Hieght of the plot
        
        ax.annotate(percentage, (x, y), size = 12) # Annotate the percantage 
    
    plt.show() # Show the plot

Let us plot barplot for the variable location. It will be helpful to know the number of percentage of cars from each city.

In [34]:
# Bar Plot for 'Location'
perc_on_bar('Location')

Note: Explore for other variables like Year, Fuel_Type, Transmission, Owner_Type`.

In [35]:
# Bar Plot for 'Name'
perc_on_bar('Name')
In [36]:
# Bar Plot for 'Year'
perc_on_bar('Year')
In [37]:
# Bar Plot for 'Fuel_Type'
perc_on_bar('Fuel_Type')
In [38]:
# Bar Plot for 'Transmission'
perc_on_bar('Transmission')
In [39]:
# Bar Plot for 'Owner_Type'
perc_on_bar('Owner_Type')
In [40]:
# Bar Plot for 'Seats'
perc_on_bar('Seats')
In [41]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   7252 non-null   object 
 1   Location               7252 non-null   object 
 2   Year                   7252 non-null   object 
 3   Kilometers_Driven      7252 non-null   int64  
 4   Fuel_Type              7252 non-null   object 
 5   Transmission           7252 non-null   object 
 6   Owner_Type             7252 non-null   object 
 7   Mileage                7250 non-null   float64
 8   Engine                 7206 non-null   float64
 9   Power                  7077 non-null   float64
 10  Seats                  7199 non-null   object 
 11  New_price              1006 non-null   float64
 12  Price                  6018 non-null   float64
 13  kilometers_driven_log  7252 non-null   float64
 14  price_log              6018 non-null   float64
 15  new_price_log          1006 non-null   float64
dtypes: float64(8), int64(1), object(7)
memory usage: 963.2+ KB

Observations and Insights from all plots: _

  1. Most cars in the dataset are sedans (5 seaters)

  2. Most cars are sold by the original owner of the car

  3. Most cars are Manual transmission

  4. Diesel and petrol cars are roughly equal (with petrol cars being slighly lower)

  5. There are older cars in dataset which is right skewed

  6. Some brands are more popular than others, but there are a lot of brands for sale

  7. Over 60% of the used cars sold are in Mumbai, Hyderabad, Coimbatore, Kochi and Pune.

For name, we need to eliminate the Name or transform it to categorial

Bivariate Analysis¶

1. Scatter plot¶

A scatter plot allows us to see relationships between two variables.

Note: Use log transformed values 'kilometers_driven_log' and 'price_log'

In [42]:
# Let us plot pair plot for the variables 'year' and 'price_log'
df.plot(x = 'price_log', y = 'Year', style = 'o')
Out[42]:
<AxesSubplot:xlabel='price_log'>

Note: Try to explore different combinations of independent variables and dependent variable. Understand the relationship between all variables.

In [97]:
sns.pairplot(df)
import matplotlib.pyplot as plt
plt.show()

Observations and Insights from all plots: _

  1. Price_log and Year shows a positive correlation
  2. Year and Kilometers_driven_log has negative correlation.
  3. Year and New_price shows a positive correlation
  4. Power and Year show as positive correlation which is interesting. Do newer cars have more Power?
  5. Mileage and Price_log are negatively correlated
  6. Mileage and Power are negatively correlated
  7. Mileage and Engine are negatively correlated
  8. Engine and Power are positively correlated
  9. Engine and Kilometers_driven_log are negatively correlated
  10. Power and Mileage are negatively correalated
  11. Power and Price_log are positively correlated
  12. Power and New_price are positively correlated

2. Heat map¶

Heat map shows a 2D correlation matrix between two numerical features.

In [44]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   7252 non-null   object 
 1   Location               7252 non-null   object 
 2   Year                   7252 non-null   object 
 3   Kilometers_Driven      7252 non-null   int64  
 4   Fuel_Type              7252 non-null   object 
 5   Transmission           7252 non-null   object 
 6   Owner_Type             7252 non-null   object 
 7   Mileage                7250 non-null   float64
 8   Engine                 7206 non-null   float64
 9   Power                  7077 non-null   float64
 10  Seats                  7199 non-null   object 
 11  New_price              1006 non-null   float64
 12  Price                  6018 non-null   float64
 13  kilometers_driven_log  7252 non-null   float64
 14  price_log              6018 non-null   float64
 15  new_price_log          1006 non-null   float64
dtypes: float64(8), int64(1), object(7)
memory usage: 963.2+ KB
In [45]:
# We can include the log transformation values and drop the original skewed data columns
plt.figure(figsize = (12, 7))
sns.heatmap(df.drop(columns=['Kilometers_Driven','Price'],axis = 1).corr(), annot = True, vmin = -1, vmax = 1)
plt.show()
In [46]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   7252 non-null   object 
 1   Location               7252 non-null   object 
 2   Year                   7252 non-null   object 
 3   Kilometers_Driven      7252 non-null   int64  
 4   Fuel_Type              7252 non-null   object 
 5   Transmission           7252 non-null   object 
 6   Owner_Type             7252 non-null   object 
 7   Mileage                7250 non-null   float64
 8   Engine                 7206 non-null   float64
 9   Power                  7077 non-null   float64
 10  Seats                  7199 non-null   object 
 11  New_price              1006 non-null   float64
 12  Price                  6018 non-null   float64
 13  kilometers_driven_log  7252 non-null   float64
 14  price_log              6018 non-null   float64
 15  new_price_log          1006 non-null   float64
dtypes: float64(8), int64(1), object(7)
memory usage: 963.2+ KB

Observations and Insights: _

Mileage is negatively correlated with a number of variables, especially Engine, new_price_log and Power Engine is strongly positively correlated with new_price_log, Power and Engine

3. Box plot¶

In [47]:
# Let us write a function that will help us create boxplot w.r.t Price for any input categorical variable.
# This function takes the categorical column as the input and returns the boxplots for the variable.
def boxplot(z):
    
    plt.figure(figsize = (12, 5)) # Setting size of boxplot
    
    sns.boxplot(x = z, y = df['Price']) # Defining x and y
    
    plt.show()
    
    plt.figure(figsize = (12, 5))
    
    plt.title('Without Outliers')
    
    sns.boxplot(x = z, y = df['Price'], showfliers = False) # Turning off the outliers
    
    plt.show()

Let us now plot bivariate analysis of target variable with a categorical variable 'Location'¶

In [48]:
# Box Plot: Price vs Location
boxplot(df['Location'])

Note: Explore by plotting box-plots for target variable and the other categorical variables like Fuel_Type, transmission, Owner_type.

In [49]:
boxplot(df['Fuel_Type'])
boxplot(df['Owner_Type'])
boxplot(df['Year'])
boxplot(df['Seats'])
boxplot(df['Transmission']) 

Observations and Insights for all plots:__

  1. Automatic cars are more expensive than manual.

  2. 2-seaters are the most expensive cars. This could be that they are sports cars, which are luxury items. This is followed by 4 seater cars. We need domain knowledge to understand why.

  3. Newer cars are more pricey. Not surprising.

  4. First owner cars are more expensive. These would tend to be newer cars. So this also makes sense.

  5. Diesel cars are more expensive than petrol.

Feature Engineering¶

Think about it: The Name column in the current format might not be very useful in our analysis. Since the name contains both the brand name and the model name of the vehicle, the column would have too many unique values to be useful in prediction. Can we extract that information from that column?

  • Hint: With 2041 unique names, car names are not going to be great predictors of the price in our current data. But we can process this column to extract important information for example brand name.
In [50]:
# Car name has both brand and model.  
# We extract it here,as this will help to fill missing values of price column as brand 
df['Brand'] = df['Name'].str.split(' ').str[0]  #Separating Brand name from the Name
df['Model'] = df['Name'].str.split(' ').str[1] + df['Name'].str.split(' ').str[2]

#check
df.head().T
Out[50]:
0 1 2 3 4
Name Maruti Wagon R LXI CNG Hyundai Creta 1.6 CRDi SX Option Honda Jazz V Maruti Ertiga VDI Audi A4 New 2.0 TDI Multitronic
Location Mumbai Pune Chennai Chennai Coimbatore
Year 2010 2015 2011 2012 2013
Kilometers_Driven 72000 41000 46000 87000 40670
Fuel_Type CNG Diesel Petrol Diesel Diesel
Transmission Manual Manual Manual Manual Automatic
Owner_Type First First First First Second
Mileage 26.6 19.67 18.2 20.77 15.2
Engine 998.0 1582.0 1199.0 1248.0 1968.0
Power 58.16 126.2 88.7 88.76 140.8
Seats 5.0 5.0 5.0 7.0 5.0
New_price NaN NaN 8.61 NaN NaN
Price 1.75 12.5 4.5 6.0 17.74
kilometers_driven_log 11.184421 10.621327 10.736397 11.373663 10.613246
price_log 0.559616 2.525729 1.504077 1.791759 2.875822
new_price_log NaN NaN 2.152924 NaN NaN
Brand Maruti Hyundai Honda Maruti Audi
Model WagonR Creta1.6 JazzV ErtigaVDI A4New
In [51]:
# Now lets check for unique names
df.Brand.unique()
Out[51]:
array(['Maruti', 'Hyundai', 'Honda', 'Audi', 'Nissan', 'Toyota',
       'Volkswagen', 'Tata', 'Land', 'Mitsubishi', 'Renault',
       'Mercedes-Benz', 'BMW', 'Mahindra', 'Ford', 'Porsche', 'Datsun',
       'Jaguar', 'Volvo', 'Chevrolet', 'Skoda', 'Mini', 'Fiat', 'Jeep',
       'Smart', 'Ambassador', 'Isuzu', 'ISUZU', 'Force', 'Bentley',
       'Lamborghini', 'Hindustan', 'OpelCorsa'], dtype=object)
In [52]:
#There seems to be an issue with some unique names. 
#Isuzu and ISUZU are the same thing
#Land should be Land Rover, which has a number of models
#Mini also has a number of models
col=['ISUZU','Isuzu','Mini','Land']

#Lets take a snippet and check out our suspicions
df[df.Brand.isin(col)].sample(5)
Out[52]:
Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price kilometers_driven_log price_log new_price_log Brand Model
2604 Mini Cooper Convertible S Mumbai 2016 15000 Petrol Automatic First 16.82 1998.0 189.08 4.0 44.28 35.00 9.615805 3.555348 3.790533 Mini CooperConvertible
5545 Land Rover Range Rover Sport SE Delhi 2014 47000 Diesel Automatic Second 12.65 2993.0 255.00 5.0 139.00 64.75 10.757903 4.170534 4.934474 Land RoverRange
1460 Land Rover Range Rover Sport 2005 2012 Sport Coimbatore 2008 69078 Petrol Manual First 0.00 NaN NaN NaN NaN 40.88 11.142992 3.710641 NaN Land RoverRange
2073 Mini Cooper 5 DOOR D Hyderabad 2017 2000 Diesel Automatic First 20.70 1496.0 113.98 5.0 42.48 34.00 7.600902 3.526361 3.749033 Mini Cooper5
718 Mini Cooper S Pune 2012 37000 Petrol Automatic Second 13.60 1598.0 181.00 4.0 NaN 17.00 10.518673 2.833213 NaN Mini CooperS
In [53]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 18 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   7252 non-null   object 
 1   Location               7252 non-null   object 
 2   Year                   7252 non-null   object 
 3   Kilometers_Driven      7252 non-null   int64  
 4   Fuel_Type              7252 non-null   object 
 5   Transmission           7252 non-null   object 
 6   Owner_Type             7252 non-null   object 
 7   Mileage                7250 non-null   float64
 8   Engine                 7206 non-null   float64
 9   Power                  7077 non-null   float64
 10  Seats                  7199 non-null   object 
 11  New_price              1006 non-null   float64
 12  Price                  6018 non-null   float64
 13  kilometers_driven_log  7252 non-null   float64
 14  price_log              6018 non-null   float64
 15  new_price_log          1006 non-null   float64
 16  Brand                  7252 non-null   object 
 17  Model                  7251 non-null   object 
dtypes: float64(8), int64(1), object(9)
memory usage: 1.1+ MB
In [54]:
# Let's change all brandnames so that they are common in the database
df.loc[df.Brand == 'ISUZU','Brand']='Isuzu'
df.loc[df.Brand=='Mini','Brand']='Mini Cooper'
df.loc[df.Brand=='Land','Brand']='Land Rover'
In [55]:
df.Brand.nunique()
Out[55]:
32
In [56]:
df.groupby(df.Brand).size().sort_values(ascending =False)
Out[56]:
Brand
Maruti           1444
Hyundai          1340
Honda             743
Toyota            507
Mercedes-Benz     380
Volkswagen        374
Ford              351
Mahindra          331
BMW               311
Audi              285
Tata              228
Skoda             202
Renault           170
Chevrolet         151
Nissan            117
Land Rover         67
Jaguar             48
Fiat               38
Mitsubishi         36
Mini Cooper        31
Volvo              28
Jeep               19
Porsche            19
Datsun             17
Isuzu               5
Force               3
Bentley             2
Lamborghini         1
OpelCorsa           1
Hindustan           1
Smart               1
Ambassador          1
dtype: int64
In [57]:
df.Model.isnull().sum()
Out[57]:
1
In [58]:
# We notice there is one model missing from the dataset.  Let's drop that row.
df.dropna(subset=['Model'],axis=0,inplace=True)
df.Model.nunique()
Out[58]:
726
In [59]:
#Let's examine the most popular model
df.groupby('Model')['Model'].size().nlargest(30)
Out[59]:
Model
SwiftDzire      189
Grandi10        179
WagonR          178
Innova2.5       145
Verna1.6        127
City1.5         122
Cityi           115
Creta1.6        110
NewC-Class      110
3Series         109
SwiftVDI         96
5Series          86
i201.2           78
SantroXing       76
XUV500W8         75
i10Sportz        75
AmazeS           69
i10Magna         69
Alto800          63
CorollaAltis     63
FigoDiesel       61
Ecosport1.5      59
A42.0            56
AltoK10          56
VitaraBrezza     55
i20Asta          54
InnovaCrysta     53
i20Sportz        53
Duster110PS      51
Fortuner4x2      50
Name: Model, dtype: int64

Observations and Insights: _¶

There are 32 unique brands in this dataset. Maruti and Hyundai dominate in terms of sales. There are 726 unique models in the dataset. The most popular brand of car is SwiftDzire, followed by Grandi10 and WagonR.

Missing value treatment¶

In [60]:
# Now check the missing values of each column. Hint: Use isnull() method
df.isnull().sum()
Out[60]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     2
Engine                     46
Power                     175
Seats                      53
New_price                6245
Price                    1233
kilometers_driven_log       0
price_log                1233
new_price_log            6245
Brand                       0
Model                       0
dtype: int64
In [61]:
#Let's see this graphically
msno.bar(df)
Out[61]:
<AxesSubplot:>

Observations and Insights: _¶

  1. Lots of missing values in the dataset.
  2. Importantly, price_log has 1234 missing points.

Missing values in Seats

In [62]:
# Checking missing values in the column 'Seats'

df['Seats'].isnull().sum()
Out[62]:
53

Think about it: Can we somehow use the extracted information from 'Name' column to impute missing values?

Hint: Impute these missing values one by one, by taking median number of seats for the particular car, using the Brand and Model name.

In [63]:
#Group by Name to determine values
df['Seats']=df.groupby(['Name'])['Seats'].apply(lambda x:x.fillna(x.median()))
df['Seats'].isnull().sum()
Out[63]:
46
In [64]:
#Now let's try grouping by Model
df['Seats']=df.groupby(['Model'])['Seats'].apply(lambda x:x.fillna(x.median()))
df['Seats'].isnull().sum()
Out[64]:
22
In [65]:
#Let's check now which car values are missing 
df[df['Seats'].isnull()==True].head(10)
Out[65]:
Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_price Price kilometers_driven_log price_log new_price_log Brand Model
208 Maruti Swift 1.3 VXi Kolkata 2010 42001 Petrol Manual First 16.1 NaN NaN NaN NaN 2.11 10.645449 0.746688 NaN Maruti Swift1.3
733 Maruti Swift 1.3 VXi Chennai 2006 97800 Petrol Manual Third 16.1 NaN NaN NaN NaN 1.75 11.490680 0.559616 NaN Maruti Swift1.3
1327 Maruti Swift 1.3 ZXI Hyderabad 2015 50295 Petrol Manual First 16.1 NaN NaN NaN NaN 5.80 10.825661 1.757858 NaN Maruti Swift1.3
2074 Maruti Swift 1.3 LXI Pune 2011 24255 Petrol Manual First 16.1 NaN NaN NaN NaN 3.15 10.096378 1.147402 NaN Maruti Swift1.3
2325 Maruti Swift 1.3 VXI ABS Pune 2015 67000 Petrol Manual First 16.1 NaN NaN NaN NaN 4.70 11.112448 1.547563 NaN Maruti Swift1.3
2335 Maruti Swift 1.3 VXi Mumbai 2007 55000 Petrol Manual Second 16.1 NaN NaN NaN NaN 1.75 10.915088 0.559616 NaN Maruti Swift1.3
2369 Maruti Estilo LXI Chennai 2008 56000 Petrol Manual Second 19.5 1061.0 NaN NaN NaN 1.50 10.933107 0.405465 NaN Maruti EstiloLXI
2668 Maruti Swift 1.3 VXi Kolkata 2014 32986 Petrol Manual First 16.1 NaN NaN NaN NaN 4.24 10.403839 1.444563 NaN Maruti Swift1.3
3404 Maruti Swift 1.3 VXi Jaipur 2006 125000 Petrol Manual Fourth & Above 16.1 NaN NaN NaN NaN 2.35 11.736069 0.854415 NaN Maruti Swift1.3
3810 Honda CR-V AT With Sun Roof Kolkata 2013 27000 Petrol Automatic First 14.0 NaN NaN NaN NaN 11.99 10.203592 2.484073 NaN Honda CR-VAT
In [66]:
# Now check total number of missing values of the seat column to verify if they are imputed or not. Hint: Use isnull() method

df['Seats'].isnull().sum()
Out[66]:
22
In [67]:
#Doing a Google search:  https://www.cardekho.com/maruti/swift/specs, we see that the Maruti Swift 1.3 is a 5 seater.
#Doing a Google search:  https://www.cardekho.com/overview/Maruti_Zen_Estilo/Maruti_Zen_Estilo_LXI_BS_IV.htm, is also a 5 seater
#Impute them all with 5 seats
df['Seats']=df['Seats'].fillna(5)

#Check if imputed
df['Seats'].isnull().sum()
Out[67]:
0

Missing values for Mileage

In [68]:
#Let's look at how many missing values there are for Engine, Power and Mileage
col=['Engine','Power','Mileage']
df[col].isnull().sum()
Out[68]:
Engine      46
Power      175
Mileage      2
dtype: int64
In [69]:
#Let's start filling missing values by grouping Name and Year and fill in missing values
#with median.

df.groupby(['Name','Year'])['Engine'].median().head(30)
df['Engine']=df.groupby(['Name','Year'])['Engine'].apply(lambda x:x.fillna(x.median()))
df['Power']=df.groupby(['Name','Year'])['Power'].apply(lambda x:x.fillna(x.median()))
df['Mileage']=df.groupby(['Name','Year'])['Mileage'].apply(lambda x:x.fillna(x.median()))
In [70]:
col=['Engine','Power','Mileage']
df[col].isnull().sum()
Out[70]:
Engine      45
Power      162
Mileage      2
dtype: int64
In [71]:
#Let's look at each unique combination of Brand and Model and display the top 10 results.

df.groupby(['Brand','Model'])['Engine'].median().head(10)
Out[71]:
Brand       Model      
Ambassador  ClassicNova    1489.0
Audi        A335           1968.0
            A41.8          1781.0
            A42.0          1968.0
            A43.0          2967.0
            A43.2          3197.0
            A430           1395.0
            A435           1968.0
            A4New          1968.0
            A62.0          1968.0
Name: Engine, dtype: float64
In [72]:
# Now check missing values of each column. Hint: Use isnull() method
df.isnull().sum()
Out[72]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     2
Engine                     45
Power                     162
Seats                       0
New_price                6245
Price                    1233
kilometers_driven_log       0
price_log                1233
new_price_log            6245
Brand                       0
Model                       0
dtype: int64
In [73]:
# Impute missing Mileage. For example, use can use median or any other methods.
# This was treated above.
df['Mileage'].isnull().sum()
Out[73]:
2
In [74]:
#Since this is 2 records, we can drop them
df.dropna(subset=['Mileage'],axis=0,inplace=True)
df.isnull().sum()
Out[74]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     0
Engine                     45
Power                     162
Seats                       0
New_price                6244
Price                    1233
kilometers_driven_log       0
price_log                1233
new_price_log            6244
Brand                       0
Model                       0
dtype: int64

Missing values for Engine

In [75]:
df['Engine'].isnull().sum()
Out[75]:
45
In [76]:
#Let's look at median, mean and max for Engine to see if we can impute with one of these values.

df.groupby(['Model','Year'])['Engine'].agg({'median','mean','max'}).sort_values(by='Model',ascending=True).head(10)
Out[76]:
max mean median
Model Year
1000AC 1998 970.0 970.000000 970.0
1Series 2013 1995.0 1995.000000 1995.0
2015 1995.0 1995.000000 1995.0
370ZAT 2012 3696.0 3696.000000 3696.0
3Series 2018 1995.0 1995.000000 1995.0
2017 1995.0 1995.000000 1995.0
2016 1995.0 1995.000000 1995.0
2015 1995.0 1995.000000 1995.0
2014 2993.0 2078.166667 1995.0
2013 2993.0 2066.428571 1995.0
In [77]:
df.isnull().sum()
Out[77]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     0
Engine                     45
Power                     162
Seats                       0
New_price                6244
Price                    1233
kilometers_driven_log       0
price_log                1233
new_price_log            6244
Brand                       0
Model                       0
dtype: int64
In [78]:
#Let's impute these records with median=1995

cols1 = ["Engine"]

for ii in cols1:
    df[ii] = df[ii].fillna(df[ii].median())
In [79]:
#Check if imputed
df['Engine'].isnull().sum()
Out[79]:
0

Missing values for Power

In [80]:
df['Power'].isnull().sum()
Out[80]:
162
In [81]:
df.groupby(['Model','Year'])['Power'].agg({'median','mean','max'}).sort_values(by='Model',ascending=True).head(10)
Out[81]:
max mean median
Model Year
1000AC 1998 NaN NaN NaN
1Series 2013 143.0 143.000000 143.0
2015 143.0 143.000000 143.0
370ZAT 2012 328.5 328.500000 328.5
3Series 2018 190.0 188.000000 190.0
2017 190.0 188.820000 190.0
2016 190.0 189.333333 190.0
2015 190.0 185.981429 184.0
2014 245.0 189.666667 184.0
2013 245.0 191.785714 184.0
In [82]:
cols1 = ['Power']

for ii in cols1:
    df[ii] = df[ii].fillna(df[ii].median())
In [83]:
df['Power'].isnull().sum()
Out[83]:
0

Missing values for New_price

In [84]:
df['New_price'].isnull().sum()
Out[84]:
6244
In [89]:
df.isnull().sum()
Out[89]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     0
Engine                      0
Power                       0
Seats                       0
New_price                6244
Price                    1233
kilometers_driven_log       0
price_log                1233
new_price_log            6244
Brand                       0
Model                       0
dtype: int64
In [90]:
#We will drop the records with New_price.
#We will also drop records of the target variable Price so as not to introduce bias.

df.drop(columns=["new_price_log"], inplace = True, axis = 1)
df.drop(columns=["New_price"], inplace = True, axis = 1)
df.shape
Out[90]:
(7249, 16)
In [92]:
df.isnull().sum()
Out[92]:
Name                        0
Location                    0
Year                        0
Kilometers_Driven           0
Fuel_Type                   0
Transmission                0
Owner_Type                  0
Mileage                     0
Engine                      0
Power                       0
Seats                       0
Price                    1233
kilometers_driven_log       0
price_log                1233
Brand                       0
Model                       0
dtype: int64
In [93]:
df.shape
Out[93]:
(7249, 16)
In [95]:
df=df.dropna()
In [96]:
df.shape
Out[96]:
(6016, 16)

Observations for missing values after imputing: _

We successfully imputed Power, Engine, Mileage and Seats. However, there are a significant number of missing values for New_price. The column was dropped.

Proposed approach¶

  • Potential techniques - What different techniques should be explored?
  • Overall solution design - What is the potential solution design?
  • Measures of success - What are the key measures of success?

POTENTIAL TECHNIQUES

There is no strict sequence that we must absolutely follow when applying machine learning techniques to a problem. However, there are some general guidelines that can be used to guide the process:

  1. We need to ensure that the data is clean. We did a lot of the work in this Milestonewhere we did EDA, univariate analysis on numerical and categorial data, bivariate data using scatterplots, box plots and heat map. We also did feature engineering where we imputed missing values.

  2. First we need to understand the problem that we are trying to solve and the type data we are working with, as well as the type of outcome we want to predict. In our situation, we are predicting a price, therefore the final method should provide a single value.

  3. We will start with a simple technique - linear regression or decision trees, which is easy to understand and interpret. This will provide a baseline for performance and a starting point for further experimentation.

  4. Next, we can try ensemble methods like Random Forest which can possibly improve performance and reduce overfitting compared to individual decision trees.

  5. In case we are overfitting, we can use regularization techniques like Ridge or Lasso regression. This can also be used to improve the performance and generalization of the model.

  6. We could try more complex methods if the performance of the simpler techniques is not satisfactory such as neural networks. But this is unlikely since this is the Machine Learning capstone.

  7. Finally, we need to evaluate the performance of the different techniques using appropriate metrics, such as accuracy or mean squared error. Select the technique that performs best on our specific problem.

OVERALL SOLUTION DESIGN

We will use the following process for the design of our solution:

  1. EDA: examine summary statistics on numerical and categorical values to understand the data. This is actually an important step in which a deep understanding of the definiton of each variable is considered.

  2. Perform univariate analysis on numerical and categorial data to understand the shape of each variable and to ensure that the distributions are normal. If not normal, then perform log transformation.

  3. Perform bivariate data using scatterplots, bos plots and heat map to discover the relationships between the variables.

  4. Perform feature engineering where missing values are discovered and imputed.

  5. Clean the data and prepare it for linear regression.

  6. Separate the dependent and independt variables, and then split the data into training and test.

  7. Looking ahead to Milestone 2, we will build our supervised learning models: 1) Linear Regression, 2) Ridge/Lasso Regression, 3) Decision Trees, 4) Random Forest

  8. We will refine our insights to uncover the most meaningful insights relevent to the problem.

  9. We will provide a comparison of the various techniques and their relative performance. That is, we will determine how they performed, which one is better relative to others, is there scope for improvement.

  10. Finally, we will make a proposal as to which model should be adoped. We will stipulate why this is the best solution For Cars4U to adopt.

MEASURES OF SUCCESS

We can provide some general guidelines for interpreting some of the most commonly used metrics for success, but it will depend on what we find in the various models

  1. Mean Absolute Error (MAE): A smaller value for MAE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.

  2. Mean Squared Error (MSE): Similar to MAE, a smaller value for MSE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.

  3. Root Mean Squared Error (RMSE): Similar to MSE, a smaller value for RMSE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.

  4. R-squared: This metric ranges from 0 to 1, with 1 indicating a perfect fit of the model to the data. Values close to 1 indicate a good fit, while values close to 0 indicate a poor fit.

  5. Accuracy: This metric ranges from 0 to 1, with 1 indicating that all instances were correctly classified. A value of 0.8 or higher is often considered good, but again it depends on the problem and the cost of false positives and false negatives.

  6. Precision: This metric ranges from 0 to 1, with 1 indicating that all positive predictions were correct. A value of 0.8 or higher is often considered good.

  7. Recall: This metric ranges from 0 to 1, with 1 indicating that all actual positive instances were correctly identified. A value of 0.8 or higher is often considered good.

  8. F1-score: This metric ranges from 0 to 1, with 1 indicating a perfect balance of precision and recall. A value of 0.8 or higher is often considered good.

Saving the Data¶

Please save the pre-processed dataset into a separate file so that we can continue without having to repeat the work we did in Milestone1. The stored data frame can be loaded into Milestone2 and implemented further.

To save the pre-processed data frame, please follow the below lines of code:

In [94]:
# Assume df_cleaned is the pre-processed data frame in your code, then
df_cleaned=df
df_cleaned.to_csv("cars_data_updated.csv", index = False)

The above code helps to save the cleaned/pre-processed dataset into csv file, that can be further loaded into Milestone2.

Milestone 2¶

Model Building¶

  1. What we want to predict is the "Price". We will use the normalized version 'price_log' for modeling.
  2. Before we proceed to the model, we'll have to encode categorical features. We will drop categorical features like Name.
  3. We'll split the data into train and test, to be able to evaluate the model that we build on the train data.
  4. Build Regression models using train data.
  5. Evaluate the model performance.

Note: Please load the data frame that was saved in Milestone 1 here before separating the data, and then proceed to the next step in Milestone 2.

Data Preparation for Model Building¶

Load the data¶

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score

#Import library for plotting data. 
import matplotlib.pyplot as plt

#to ignore warnings
import warnings
warnings.filterwarnings('ignore')
In [2]:
cars_data = pd.read_csv("cars_data_updated.csv")
In [3]:
cars_data.shape
Out[3]:
(7249, 16)
In [4]:
cars_data.head()
Out[4]:
Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats Price kilometers_driven_log price_log Brand Model
0 Maruti Wagon R LXI CNG Mumbai 2010 72000 CNG Manual First 26.60 998.0 58.16 5.0 1.75 11.184421 0.559616 Maruti WagonR
1 Hyundai Creta 1.6 CRDi SX Option Pune 2015 41000 Diesel Manual First 19.67 1582.0 126.20 5.0 12.50 10.621327 2.525729 Hyundai Creta1.6
2 Honda Jazz V Chennai 2011 46000 Petrol Manual First 18.20 1199.0 88.70 5.0 4.50 10.736397 1.504077 Honda JazzV
3 Maruti Ertiga VDI Chennai 2012 87000 Diesel Manual First 20.77 1248.0 88.76 7.0 6.00 11.373663 1.791759 Maruti ErtigaVDI
4 Audi A4 New 2.0 TDI Multitronic Coimbatore 2013 40670 Diesel Automatic Second 15.20 1968.0 140.80 5.0 17.74 10.613246 2.875822 Audi A4New
In [5]:
# This drops all records that have null values in price_log
cars_data.dropna(subset=["price_log"], inplace=True)
In [6]:
#check data types
cars_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6016 entries, 0 to 6015
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   6016 non-null   object 
 1   Location               6016 non-null   object 
 2   Year                   6016 non-null   int64  
 3   Kilometers_Driven      6016 non-null   int64  
 4   Fuel_Type              6016 non-null   object 
 5   Transmission           6016 non-null   object 
 6   Owner_Type             6016 non-null   object 
 7   Mileage                6016 non-null   float64
 8   Engine                 6016 non-null   float64
 9   Power                  6016 non-null   float64
 10  Seats                  6016 non-null   float64
 11  Price                  6016 non-null   float64
 12  kilometers_driven_log  6016 non-null   float64
 13  price_log              6016 non-null   float64
 14  Brand                  6016 non-null   object 
 15  Model                  6016 non-null   object 
dtypes: float64(7), int64(2), object(7)
memory usage: 799.0+ KB
In [7]:
cars_data.shape
Out[7]:
(6016, 16)
In [8]:
#count of unique features
cars_data.nunique()
Out[8]:
Name                     1874
Location                   11
Year                       22
Kilometers_Driven        3092
Fuel_Type                   4
Transmission                2
Owner_Type                  4
Mileage                   430
Engine                    145
Power                     369
Seats                       8
Price                    1373
kilometers_driven_log    3092
price_log                1373
Brand                      30
Model                     687
dtype: int64
In [9]:
## Check total number of missing values of each column. Hint: Use isnull() method
cars_data.isnull().sum()
Out[9]:
Name                     0
Location                 0
Year                     0
Kilometers_Driven        0
Fuel_Type                0
Transmission             0
Owner_Type               0
Mileage                  0
Engine                   0
Power                    0
Seats                    0
Price                    0
kilometers_driven_log    0
price_log                0
Brand                    0
Model                    0
dtype: int64
In [10]:
# Visualization of correlation
# import pandas as pd  - already imported at the beginning
import seaborn as sns

# calculate the correlation matrix
corr = cars_data.corr()

# create a heatmap of the correlation matrix
sns.heatmap(corr, annot=True)
Out[10]:
<AxesSubplot:>

Split the Data¶

  • Step1: Seperating the indepdent variables (X) and the dependent variable (y).

  • Step2: Encode the categorical variables in X using pd.dummies.

  • Step3: Split the data into train and test using train_test_split.

  • Think about it: Why we should drop 'Name','Price','price_log','Kilometers_Driven' from X before splitting?

    SPLITTING THE DATA In linear regression, we are trying to build a model to predict the value of a dependent variable based on the values of one or more independent variables. When splitting the data into training and testing sets, it's important to drop the dependent variable from the dataset we use for training the model. This is because the model should not have access to the dependent variable during the training process, as it would not be reflective of real-world scenarios when we use the model to make predictions on new, unseen data. If the dependent variable is included in the training data, the model may simply memorize the data rather than learn the underlying relationships between the independent and dependent variables. By only using the independent variables to train the model, we can ensure that it has learned the underlying relationships and is able to make accurate predictions on new data.

    Name should be dropped as it has too many unique values and we will use Brand and/or Model instead. Price and price_log should be dropped when building the model Kilometers_Driven is replaced by its log

    TESTING THE DATA Note that the dependent variable data is typically only used during the evaluation of the model. This means that after the model has been trained on the training data, which does not include the target variable, the model is then tested on a separate set of data called the testing data. The testing data should include the target variable, so that we can compare the predictions made by the model to the actual values of the target variable in the testing data. This allows us to evaluate the performance of the model and determine how well it is able to make predictions on new, unseen data. The evaluation metric used could be mean squared error, R squared, adjusted R squared etc. In summary, the dependent variable data is not used during the training process, but is used during the evaluation process to measure the model's performance.

    In [11]:
    # Step-1
    X = cars_data.drop(['Name','Price','price_log','Kilometers_Driven'], axis = 1)
    y = cars_data[["price_log", "Price"]]
    
    In [12]:
    X.info()
    
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 6016 entries, 0 to 6015
    Data columns (total 12 columns):
     #   Column                 Non-Null Count  Dtype  
    ---  ------                 --------------  -----  
     0   Location               6016 non-null   object 
     1   Year                   6016 non-null   int64  
     2   Fuel_Type              6016 non-null   object 
     3   Transmission           6016 non-null   object 
     4   Owner_Type             6016 non-null   object 
     5   Mileage                6016 non-null   float64
     6   Engine                 6016 non-null   float64
     7   Power                  6016 non-null   float64
     8   Seats                  6016 non-null   float64
     9   kilometers_driven_log  6016 non-null   float64
     10  Brand                  6016 non-null   object 
     11  Model                  6016 non-null   object 
    dtypes: float64(5), int64(1), object(6)
    memory usage: 611.0+ KB
    
    In [13]:
    y.info()
    
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 6016 entries, 0 to 6015
    Data columns (total 2 columns):
     #   Column     Non-Null Count  Dtype  
    ---  ------     --------------  -----  
     0   price_log  6016 non-null   float64
     1   Price      6016 non-null   float64
    dtypes: float64(2)
    memory usage: 141.0 KB
    
    In [14]:
    # Check total number of missing values of each column in X. Hint: Use isnull() method
    X.isnull().sum()
    
    Out[14]:
    Location                 0
    Year                     0
    Fuel_Type                0
    Transmission             0
    Owner_Type               0
    Mileage                  0
    Engine                   0
    Power                    0
    Seats                    0
    kilometers_driven_log    0
    Brand                    0
    Model                    0
    dtype: int64
    In [15]:
    # Check total number of missing values of each column in y. Hint: Use isnull() method
    y.isnull().sum()
    
    Out[15]:
    price_log    0
    Price        0
    dtype: int64
    In [16]:
    # Step-2 Use pd.get_dummies(drop_first = True)
    X = pd.get_dummies(X, drop_first = True)
    
    In [17]:
    X.head()
    
    Out[17]:
    Year Mileage Engine Power Seats kilometers_driven_log Location_Bangalore Location_Chennai Location_Coimbatore Location_Delhi ... Model_i201.4 Model_i202015-2017 Model_i20Active Model_i20Asta Model_i20Diesel Model_i20Era Model_i20Magna Model_i20Sportz Model_redi-GOS Model_redi-GOT
    0 2010 26.60 998.0 58.16 5.0 11.184421 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    1 2015 19.67 1582.0 126.20 5.0 10.621327 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    2 2011 18.20 1199.0 88.70 5.0 10.736397 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
    3 2012 20.77 1248.0 88.76 7.0 11.373663 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
    4 2013 15.20 1968.0 140.80 5.0 10.613246 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0

    5 rows × 738 columns

    In [18]:
    #Check shape of X
    X.shape
    
    Out[18]:
    (6016, 738)
    In [19]:
    #Check shape of y
    y.shape
    
    Out[19]:
    (6016, 2)
    In [20]:
    # Import library for preparing data
    from sklearn.model_selection import train_test_split
    
    In [21]:
    # Step-3 Splitting data into training and test set:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)
    print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
    
    (4211, 738) (1805, 738) (4211, 2) (1805, 2)
    

    Next, we define the function that we will use to evaluate each model created in this notebook. We use this model to determine the best solution to our business problem.

    1) The get_model_score function uses R2 2) The get_model_score_adjusted_R2 uses Adjusted R2

    This code perform evaluation of a given regression model. The function takes the regression model as an input and returns a list of four scores: the training set R-square, the test set R-square, the training set RMSE, and the test set RMSE.

    The code first makes predictions on the training and test sets using the input model, and then uses the R-square and RMSE metrics from the scikit-learn library to evaluate the performance of the model. The scores are then stored in the 'score_list' list and returned at the end of the function.

    Additionally, if the flag input is set to True (which is the default), the function will also print the R-square and RMSE scores for both the training and test sets.

    In [22]:
    #Let us write a function for calculating r2_score and RMSE on train and test data
    #This function takes model as an input on which we have trained particular algorithm
    #The categorical column as the input and returns the boxplots and histograms for the variable
    
    def get_model_score(model, flag = True):
        '''
        model : regressor to predict values of X
    
        '''
        # Defining an empty list to store train and test results
        score_list = [] 
        
        pred_train = model.predict(X_train)
        
        pred_train_ = np.exp(pred_train)
        
        pred_test = model.predict(X_test)
        
        pred_test_ = np.exp(pred_test)
        
        train_r2 = metrics.r2_score(y_train['Price'], pred_train_)
        
        test_r2 = metrics.r2_score(y_test['Price'], pred_test_)
        
        train_rmse = metrics.mean_squared_error(y_train['Price'], pred_train_, squared = False)
        
        test_rmse = metrics.mean_squared_error(y_test['Price'], pred_test_, squared = False)
    
       
        # Adding all scores in the list
        score_list.extend((train_r2, test_r2, train_rmse, test_rmse))
        
        # If the flag is set to True then only the following print statements will be dispayed, the default value is True
        if flag == True: 
            
            print("R-square on training set : ", metrics.r2_score(y_train['Price'], pred_train_))
            
            print("R-square on test set : ", metrics.r2_score(y_test['Price'], pred_test_))
            
            print("RMSE on training set : ", np.sqrt(metrics.mean_squared_error(y_train['Price'], pred_train_)))
            
            print("RMSE on test set : ", np.sqrt(metrics.mean_squared_error(y_test['Price'], pred_test_)))
        
        # Returning the list with train and test scores
        return score_list
    
    In [23]:
    # This function uses Adjusted R2
    
    def get_model_score_adjusted_R2(model, flag = True):
        '''
        model : regressor to predict values of X
        '''
        pred_train = model.predict(X_train)
        pred_train_ = np.exp(pred_train)
        pred_test = model.predict(X_test)
        pred_test_ = np.exp(pred_test)
        
        n = X_train.shape[0]
        p = X_train.shape[1]
        train_r2 = 1 - (1-metrics.r2_score(y_train['Price'], pred_train_))*(n-1)/(n-p-1)
        n = X_test.shape[0]
        p = X_test.shape[1]
        test_r2 = 1 - (1-metrics.r2_score(y_test['Price'], pred_test_))*(n-1)/(n-p-1)
        
        train_rmse = np.sqrt(metrics.mean_squared_error(y_train['Price'], pred_train_))
        test_rmse = np.sqrt(metrics.mean_squared_error(y_test['Price'], pred_test_))
        
        score_list_adjusted_R2 = [train_r2, test_r2, train_rmse, test_rmse]
        
        if flag:
            print("Adjusted R2 on training set : ", train_r2)
            print("Adjusted R2 on test set : ", test_r2)
            print("RMSE on training set : ", train_rmse)
            print("RMSE on test set : ", test_rmse)
        
        return score_list_adjusted_R2
    

    For Regression Problems, some of the algorithms used are :

    1) Linear Regression
    2) Ridge / Lasso Regression
    3) Decision Trees
    4) Random Forest

    Linear Regression¶

    Fitting a linear model¶

    Linear Regression can be implemented using:

    1) Sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
    2) Statsmodels: https://www.statsmodels.org/stable/regression.html

    LR - Sklearn¶

    In [24]:
    # Import Linear Regression from sklearn
    from sklearn.linear_model import LinearRegression
    
    # Create a linear regression model
    lr = LinearRegression()
    
    # Fit linear regression model
    lr.fit(X_train, y_train['price_log']) 
    
    Out[24]:
    LinearRegression()
    In [25]:
    # Get score of the model
    LR_score = get_model_score(lr)
    
    R-square on training set :  0.9632326167143936
    R-square on test set :  0.8884457821245254
    RMSE on training set :  2.1857914088366726
    RMSE on test set :  3.541569655485
    
    In [26]:
    print(LR_score)
    
    [0.9632326167143936, 0.8884457821245254, 2.1857914088366726, 3.541569655485]
    
    In [27]:
    X.shape
    
    Out[27]:
    (6016, 738)
    In [28]:
    y.shape
    
    Out[28]:
    (6016, 2)

    Checking for the assumptions and rebuilding the model

    1. Mean of residuals should be 0
    2. Normality of error terms
    3. Linearity of variables
    4. No heteroscedasticity
    1. Mean of Residuals. In order to meet the criteria, mean should be close to 0. The code fits an ordinary least squares (OLS) regression model using the statsmodels library in Python. The dependent variable (y_train) and independent variables (X_train) are inputted into the model. The model is then fit using the .fit() method and the summary of the fit is printed using .summary(). The summary provides information about the regression coefficients, goodness of fit measures, statistical tests, and diagnostics.
    In [29]:
    y_pred = lr.predict(X_test)
    residuals = y_test['price_log'] - y_pred
    print(residuals)
    
    5460    0.334171
    4367   -0.018583
    1227    0.166880
    2253   -0.359454
    79     -0.129142
              ...   
    188    -0.038821
    5218    0.135975
    3884    0.113695
    3978    0.024030
    2698    0.031614
    Name: price_log, Length: 1805, dtype: float64
    
    In [30]:
    residuals.mean()
    
    Out[30]:
    -0.010740112265732153
    1. Test for Normality of Error Terms. We do this by plotting a histogram of residuals. They should be normal.
    In [31]:
    sns.histplot(residuals, kde = True)
    
    Out[31]:
    <AxesSubplot:xlabel='price_log', ylabel='Count'>
    1. Linearity of Variables - they should be randomly and uniformly scattered on the x-axis. Predictor variables must have a linear relation with the dependent variable. To test this assumption, we plot the residuals and the fitted values and ensure that residuals do not form a strong pattern.
    In [32]:
    # Predicted values
    fitted = lr.predict(X_train)
    residuals = y_train['price_log'] - fitted
    
    #fitted = ols_res_1.fittedvalues
    sns.residplot(x = fitted, y = residuals, color = "lightblue")
    plt.xlabel("Fitted Values")
    plt.ylabel("Residual")
    plt.title("Residual PLOT")
    plt.show()
    
    1. No Heteroscedasticity/Test for Homoscedasticity. Homoscedasticity: if the variance of the residuals are symmetrically distributed across the regression line, then the data is said to homoscedastic. Heteroscedasticity: if the variance is unequal for the residuals across the regression line, then the data is said to be heteroscedastic. In this case, the residuals can form an arrow shape or any other non symmetrical shape.
    In [33]:
    import seaborn as sns
    
    y_pred = lr.predict(X_test)
    residuals = y_test['price_log'] - y_pred
    
    p = sns.scatterplot(y_pred,residuals)
    
    plt.xlabel('y_pred/predicted values')
    plt.ylabel('Residuals')
    plt.ylim(-5,5)
    plt.xlim(0,26)
    p = sns.lineplot([0,26],[0,0],color='blue')
    p = plt.title('Residuals vs fitted values plot for homoscedasticity check')
    
    In [133]:
    # get the coefficients of the linear regression model
    coefs = lr.coef_
    
    # create a dataframe to store the coefficients
    coef_df = pd.DataFrame({'Feature': X_train.columns, 'Coefficient': coefs})
    
    # sort the dataframe by the magnitude of the coefficients
    coef_df['Absolute_Coefficient'] = coef_df['Coefficient'].abs()
    coef_df.sort_values(by='Absolute_Coefficient', ascending=False, inplace=True)
    
    # print the feature coefficients
    print("Feature Coefficients: \n", coef_df)
    
    Feature Coefficients: 
                    Feature   Coefficient  Absolute_Coefficient
    162  Model_CayenneBase -3.224471e+00          3.224471e+00
    422    Model_MustangV8  1.383597e+00          1.383597e+00
    427      Model_NanoSTD -1.252193e+00          1.252193e+00
    352      Model_Ikon1.4 -1.231596e+00          1.231596e+00
    36   Brand_Lamborghini  1.159825e+00          1.159825e+00
    ..                 ...           ...                   ...
    177    Model_CiazAlpha  3.733125e-15          3.733125e-15
    700     Model_XUV500W9  5.551115e-17          5.551115e-17
    683       Model_XE2.0L  1.259030e-17          1.259030e-17
    715       Model_ZenLXI  0.000000e+00          0.000000e+00
    684  Model_XEPortfolio  0.000000e+00          0.000000e+00
    
    [738 rows x 3 columns]
    
    In [163]:
    original_columns = [col for col in X_train.columns if '_' not in col]
    coef_df = coef_df[coef_df['feature'].isin(original_columns)]
    top_5 = coef_df.sort_values(by='coef', ascending=False).head(7)
    
    plt.barh(top_5['feature'], top_5['coef'])
    plt.xlabel('Coefficient')
    plt.ylabel('Feature')
    plt.title('Top 5 Original Features and Their Coefficients')
    plt.show()
    

    Observations from results: _

    R-square on training set : 0.9632326167144044 and R-square on test set : 0.8884457821246294 are indicating that the model is performing well on both training and test set. A high R-squared value (closer to 1) indicates that the model is explaining a large proportion of the variance in the data. However, in this case, the R-square on the training set is higher than the R-square on the test set, which indicates that the model is overfitting the training data.

    RMSE on training set : 2.185791408836351 and RMSE on test set : 3.5415696554833485 are indicating the error of the model on both training and test set. RMSE (Root Mean Squared Error) is a measure of the difference between the predicted and actual values. The lower the RMSE, the better the model. In this case, the RMSE on the training set is lower than the RMSE on the test set, which indicates that the model is not generalizing well to unseen data.

    Overall, it seems like the model is overfitting the training data and is not generalizing well to unseen data.

    Important variables of Linear Regression

    Building a model using statsmodels.

    LR - Statsmodel 1 - R2¶

    In [34]:
    # Import Statsmodels 
    import statsmodels.api as sm
    
    # Statsmodel api does not add a constant by default. We need to add it explicitly
    x_train = sm.add_constant(X_train)
    
    # Add constant to test data
    x_test = sm.add_constant(X_test)
    
    def build_ols_model(train):
        
        # Create the model
        olsmodel = sm.OLS(y_train["price_log"], train)
        
        return olsmodel.fit()
    
    # Fit linear model on new dataset
    olsmodel1 = build_ols_model(X_train)
    
    print(olsmodel1.summary())
    
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:              price_log   R-squared:                       0.973
    Model:                            OLS   Adj. R-squared:                  0.969
    Method:                 Least Squares   F-statistic:                     204.2
    Date:                Thu, 02 Feb 2023   Prob (F-statistic):               0.00
    Time:                        12:22:32   Log-Likelihood:                 2207.0
    No. Observations:                4211   AIC:                            -3134.
    Df Residuals:                    3571   BIC:                             927.1
    Df Model:                         639                                         
    Covariance Type:            nonrobust                                         
    =============================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
    ---------------------------------------------------------------------------------------------
    Year                          0.0997      0.002     65.021      0.000       0.097       0.103
    Mileage                      -0.0039      0.002     -2.493      0.013      -0.007      -0.001
    Engine                     9.227e-06    2.9e-05      0.318      0.750   -4.76e-05     6.6e-05
    Power                         0.0014      0.000      4.087      0.000       0.001       0.002
    Seats                         0.0118      0.019      0.608      0.543      -0.026       0.050
    kilometers_driven_log        -0.0760      0.005    -14.577      0.000      -0.086      -0.066
    Location_Bangalore            0.1733      0.017     10.163      0.000       0.140       0.207
    Location_Chennai              0.0485      0.016      2.987      0.003       0.017       0.080
    Location_Coimbatore           0.1419      0.015      9.177      0.000       0.112       0.172
    Location_Delhi               -0.0932      0.016     -5.950      0.000      -0.124      -0.062
    Location_Hyderabad            0.1470      0.015      9.765      0.000       0.117       0.177
    Location_Jaipur              -0.0289      0.017     -1.745      0.081      -0.061       0.004
    Location_Kochi               -0.0098      0.015     -0.630      0.529      -0.040       0.021
    Location_Kolkata             -0.2266      0.016    -14.190      0.000      -0.258      -0.195
    Location_Mumbai              -0.0768      0.015     -5.106      0.000      -0.106      -0.047
    Location_Pune                -0.0337      0.016     -2.151      0.032      -0.064      -0.003
    Fuel_Type_Diesel              0.0166      0.031      0.530      0.596      -0.045       0.078
    Fuel_Type_LPG                -0.0655      0.076     -0.859      0.390      -0.215       0.084
    Fuel_Type_Petrol             -0.0912      0.032     -2.852      0.004      -0.154      -0.028
    Transmission_Manual          -0.0960      0.010     -9.375      0.000      -0.116      -0.076
    Owner_Type_Fourth & Above    -0.0864      0.074     -1.172      0.241      -0.231       0.058
    Owner_Type_Second            -0.0514      0.008     -6.637      0.000      -0.067      -0.036
    Owner_Type_Third             -0.1223      0.021     -5.920      0.000      -0.163      -0.082
    Brand_Audi                 -190.4469      3.009    -63.282      0.000    -196.347    -184.546
    Brand_BMW                  -187.0584      2.958    -63.244      0.000    -192.857    -181.259
    Brand_Bentley               -97.9801      1.556    -62.987      0.000    -101.030     -94.930
    Brand_Chevrolet            -189.1596      2.963    -63.834      0.000    -194.969    -183.350
    Brand_Datsun               -170.6479      2.668    -63.953      0.000    -175.879    -165.416
    Brand_Fiat                 -183.4456      2.871    -63.896      0.000    -189.075    -177.817
    Brand_Force                 -98.8717      1.560    -63.395      0.000    -101.930     -95.814
    Brand_Ford                 -192.2491      3.017    -63.715      0.000    -198.165    -186.333
    Brand_Honda                -194.2222      3.048    -63.726      0.000    -200.198    -188.247
    Brand_Hyundai              -194.8630      3.054    -63.807      0.000    -200.851    -188.875
    Brand_Isuzu                 -99.0671      1.560    -63.498      0.000    -102.126     -96.008
    Brand_Jaguar               -176.9958      2.802    -63.174      0.000    -182.489    -171.503
    Brand_Jeep                  -98.8187      1.559    -63.395      0.000    -101.875     -95.762
    Brand_Lamborghini           -97.7633      1.557    -62.777      0.000    -100.817     -94.710
    Brand_Land Rover           -147.6478      2.338    -63.164      0.000    -152.231    -143.065
    Brand_Mahindra             -193.8566      3.047    -63.621      0.000    -199.831    -187.882
    Brand_Maruti               -198.5188      3.096    -64.122      0.000    -204.589    -192.449
    Brand_Mercedes-Benz        -191.4421      3.026    -63.258      0.000    -197.376    -185.509
    Brand_Mini Cooper          -172.4000      2.723    -63.312      0.000    -177.739    -167.061
    Brand_Mitsubishi           -173.0238      2.722    -63.565      0.000    -178.361    -167.687
    Brand_Nissan               -185.1204      2.905    -63.729      0.000    -190.816    -179.425
    Brand_Porsche              -175.0380      2.768    -63.240      0.000    -180.465    -169.611
    Brand_Renault              -188.5606      2.957    -63.770      0.000    -194.358    -182.763
    Brand_Skoda                -192.2906      3.018    -63.721      0.000    -198.207    -186.374
    Brand_Smart                 -99.1499      1.556    -63.725      0.000    -102.200     -96.099
    Brand_Tata                 -194.0216      3.035    -63.925      0.000    -199.972    -188.071
    Brand_Toyota               -191.5153      3.012    -63.582      0.000    -197.421    -185.610
    Brand_Volkswagen           -190.8784      2.997    -63.682      0.000    -196.755    -185.002
    Brand_Volvo                -175.3870      2.768    -63.362      0.000    -180.814    -169.960
    Model_1Series               -10.4063      0.188    -55.323      0.000     -10.775     -10.037
    Model_3Series               -10.2546      0.155    -66.004      0.000     -10.559      -9.950
    Model_5Series                -9.9958      0.157    -63.744      0.000     -10.303      -9.688
    Model_6Series                -9.3836      0.174    -53.847      0.000      -9.725      -9.042
    Model_7Series                -9.5631      0.166    -57.505      0.000      -9.889      -9.237
    Model_800AC                  -0.7719      0.177     -4.371      0.000      -1.118      -0.426
    Model_800DX                1.018e-11    8.9e-13     11.441      0.000    8.44e-12    1.19e-11
    Model_800Std                 -0.5967      0.193     -3.097      0.002      -0.974      -0.219
    Model_A-StarAT               -0.2263      0.192     -1.178      0.239      -0.603       0.150
    Model_A-StarLxi              -0.1284      0.181     -0.710      0.478      -0.483       0.226
    Model_A-StarVxi              -0.0980      0.172     -0.571      0.568      -0.435       0.239
    Model_A335                   -6.9104      0.132    -52.516      0.000      -7.168      -6.652
    Model_A41.8                  -6.8775      0.149    -46.248      0.000      -7.169      -6.586
    Model_A42.0                  -6.8241      0.107    -63.648      0.000      -7.034      -6.614
    Model_A43.0                  -7.0016      0.135    -51.879      0.000      -7.266      -6.737
    Model_A43.2                8.799e-12   1.29e-12      6.820      0.000    6.27e-12    1.13e-11
    Model_A430                   -6.8209      0.186    -36.671      0.000      -7.186      -6.456
    Model_A435                   -6.8321      0.123    -55.571      0.000      -7.073      -6.591
    Model_A4New                  -6.9809      0.136    -51.180      0.000      -7.248      -6.713
    Model_A62.0                  -6.2528      0.185    -33.880      0.000      -6.615      -5.891
    Model_A62.7                  -6.8744      0.129    -53.346      0.000      -7.127      -6.622
    Model_A62.8                  -6.7536      0.182    -37.179      0.000      -7.110      -6.397
    Model_A62011-2015            -6.6467      0.111    -59.636      0.000      -6.865      -6.428
    Model_A63.0                  -6.9279      0.134    -51.887      0.000      -7.190      -6.666
    Model_A635                   -6.4954      0.127    -51.293      0.000      -6.744      -6.247
    Model_A72011-2015            -6.1878      0.188    -32.959      0.000      -6.556      -5.820
    Model_A8L                    -5.7975      0.184    -31.523      0.000      -6.158      -5.437
    Model_AClass                 -5.9945      0.114    -52.575      0.000      -6.218      -5.771
    Model_AccentCRDi             -3.9628      0.120    -32.909      0.000      -4.199      -3.727
    Model_AccentExecutive      3.258e-11   3.66e-12      8.890      0.000    2.54e-11    3.98e-11
    Model_AccentGLE              -3.9540      0.071    -55.455      0.000      -4.094      -3.814
    Model_AccentGLS              -3.8569      0.103    -37.375      0.000      -4.059      -3.655
    Model_Accord2.4              -3.7927      0.078    -48.905      0.000      -3.945      -3.641
    Model_Accord2001-2003        -3.6696      0.127    -28.955      0.000      -3.918      -3.421
    Model_AccordV6               -4.1531      0.168    -24.789      0.000      -4.482      -3.825
    Model_AccordVTi-L            -4.0594      0.128    -31.711      0.000      -4.310      -3.808
    Model_Alto800                -0.4585      0.160     -2.862      0.004      -0.773      -0.144
    Model_AltoGreen              -0.4692      0.223     -2.102      0.036      -0.907      -0.031
    Model_AltoK10                -0.3591      0.160     -2.247      0.025      -0.673      -0.046
    Model_AltoLX                 -0.6173      0.192     -3.215      0.001      -0.994      -0.241
    Model_AltoLXI              -3.96e-12   8.81e-13     -4.493      0.000   -5.69e-12   -2.23e-12
    Model_AltoLXi                -0.2642      0.160     -1.648      0.099      -0.579       0.050
    Model_AltoStd                -0.2983      0.191     -1.558      0.119      -0.674       0.077
    Model_AltoVXi             -5.404e-12   6.62e-13     -8.165      0.000    -6.7e-12   -4.11e-12
    Model_AltoVxi                 0.1386      0.221      0.626      0.531      -0.295       0.572
    Model_AmazeE                 -4.2098      0.085    -49.309      0.000      -4.377      -4.042
    Model_AmazeEX                -4.3100      0.110    -39.351      0.000      -4.525      -4.095
    Model_AmazeS                 -4.2362      0.071    -59.441      0.000      -4.376      -4.096
    Model_AmazeSX                -4.2757      0.101    -42.256      0.000      -4.474      -4.077
    Model_AmazeV                 -4.2274      0.127    -33.191      0.000      -4.477      -3.978
    Model_AmazeVX                -4.1739      0.077    -54.159      0.000      -4.325      -4.023
    Model_Ameo1.2                -7.6078      0.128    -59.497      0.000      -7.859      -7.357
    Model_Ameo1.5                -7.5577      0.147    -51.337      0.000      -7.846      -7.269
    Model_AspireAmbiente         -6.2611      0.181    -34.533      0.000      -6.617      -5.906
    Model_AspireTitanium         -5.9793      0.130    -45.825      0.000      -6.235      -5.723
    Model_Aveo1.4                -9.5678      0.170    -56.322      0.000      -9.901      -9.235
    Model_Aveo1.6                -9.7178      0.209    -46.450      0.000     -10.128      -9.308
    Model_AveoU-VA               -9.7213      0.160    -60.940      0.000     -10.034      -9.408
    Model_AvventuraMULTIJET     -14.9094      0.259    -57.580      0.000     -15.417     -14.402
    Model_BClass                 -6.0582      0.105    -57.530      0.000      -6.265      -5.852
    Model_BR-Vi-DTEC           6.159e-13   8.54e-13      0.721      0.471   -1.06e-12    2.29e-12
    Model_BR-Vi-VTEC             -4.0353      0.134    -30.145      0.000      -4.298      -3.773
    Model_BRVi-VTEC              -3.9152      0.104    -37.533      0.000      -4.120      -3.711
    Model_BalenoAlpha             0.2921      0.163      1.793      0.073      -0.027       0.612
    Model_BalenoDelta             0.1067      0.166      0.644      0.520      -0.218       0.431
    Model_BalenoLXI              -0.4686      0.192     -2.438      0.015      -0.845      -0.092
    Model_BalenoRS                0.2623      0.176      1.490      0.136      -0.083       0.607
    Model_BalenoSigma             0.0793      0.182      0.436      0.663      -0.277       0.436
    Model_BalenoVxi              -0.3108      0.192     -1.616      0.106      -0.688       0.066
    Model_BalenoZeta              0.1698      0.164      1.035      0.301      -0.152       0.492
    Model_BeatDiesel             -9.7274      0.154    -62.962      0.000     -10.030      -9.425
    Model_BeatLS                 -9.6412      0.159    -60.686      0.000      -9.953      -9.330
    Model_BeatLT                 -9.6850      0.155    -62.370      0.000      -9.989      -9.381
    Model_BeatOption           2.326e-12   7.66e-13      3.038      0.002    8.25e-13    3.83e-12
    Model_Beetle2.0              1.4e-12      8e-13      1.749      0.080   -1.69e-13    2.97e-12
    Model_BoleroDI               -4.3241      0.168    -25.667      0.000      -4.654      -3.994
    Model_BoleroSLE              -4.6119      0.170    -27.160      0.000      -4.945      -4.279
    Model_BoleroSLX              -4.3986      0.169    -25.971      0.000      -4.731      -4.067
    Model_BoleroVLX              -4.3062      0.170    -25.279      0.000      -4.640      -3.972
    Model_BoleroZLX              -4.3942      0.098    -44.822      0.000      -4.586      -4.202
    Model_BoleromHAWK          3.808e-12   7.56e-13      5.040      0.000    2.33e-12    5.29e-12
    Model_BoltQuadrajet          -4.6461      0.136    -34.071      0.000      -4.913      -4.379
    Model_BoltRevotron           -4.8586      0.172    -28.265      0.000      -5.196      -4.522
    Model_BoxsterS            -1.907e-12   7.73e-13     -2.468      0.014   -3.42e-12   -3.92e-13
    Model_Brio1.2                -4.3234      0.101    -42.859      0.000      -4.521      -4.126
    Model_BrioE                  -4.3443      0.166    -26.130      0.000      -4.670      -4.018
    Model_BrioEX                 -4.4702      0.167    -26.704      0.000      -4.798      -4.142
    Model_BrioS                  -4.3564      0.071    -60.978      0.000      -4.496      -4.216
    Model_BrioV                  -4.3053      0.090    -48.050      0.000      -4.481      -4.130
    Model_BrioVX                 -4.3074      0.088    -48.723      0.000      -4.481      -4.134
    Model_C-ClassProgressive     -5.8321      0.148    -39.441      0.000      -6.122      -5.542
    Model_CLA200                 -5.8348      0.104    -55.887      0.000      -6.039      -5.630
    Model_CLS-Class2006-2010     -5.2682      0.175    -30.183      0.000      -5.610      -4.926
    Model_CR-V2.0              2.984e-12   8.38e-13      3.559      0.000    1.34e-12    4.63e-12
    Model_CR-V2.0L               -3.4664      0.095    -36.526      0.000      -3.652      -3.280
    Model_CR-V2.4                -3.6665      0.090    -40.900      0.000      -3.842      -3.491
    Model_CR-V2.4L               -3.3975      0.101    -33.706      0.000      -3.595      -3.200
    Model_CR-VAT              -5.736e-13   8.67e-13     -0.662      0.508   -2.27e-12    1.13e-12
    Model_CR-VPetrol          -2.473e-12   6.82e-13     -3.626      0.000   -3.81e-12   -1.14e-12
    Model_CR-VRVi                -3.3289      0.126    -26.476      0.000      -3.575      -3.082
    Model_CR-VSport              -3.1921      0.168    -19.021      0.000      -3.521      -2.863
    Model_Camry2.5               -5.5548      0.186    -29.843      0.000      -5.920      -5.190
    Model_CamryA/T            -8.829e-13   5.96e-13     -1.482      0.138   -2.05e-12    2.85e-13
    Model_CamryHybrid            -5.5055      0.138    -40.038      0.000      -5.775      -5.236
    Model_CamryW2                -6.5387      0.182    -35.919      0.000      -6.896      -6.182
    Model_CamryW4                -6.6148      0.182    -36.353      0.000      -6.972      -6.258
    Model_CaptivaLT           -2.426e-13    8.2e-13     -0.296      0.767   -1.85e-12    1.37e-12
    Model_CaptivaLTZ             -9.1088      0.212    -42.900      0.000      -9.525      -8.693
    Model_Captur1.5              -9.4996      0.189    -50.314      0.000      -9.870      -9.129
    Model_Cayenne2009-2014      -21.8549      0.362    -60.438      0.000     -22.564     -21.146
    Model_CayenneBase           -25.2074      0.380    -66.343      0.000     -25.952     -24.462
    Model_CayenneDiesel         -21.4012      0.377    -56.837      0.000     -22.139     -20.663
    Model_CayenneS              -21.6049      0.380    -56.876      0.000     -22.350     -20.860
    Model_CayenneTurbo          -21.6669      0.378    -57.262      0.000     -22.409     -20.925
    Model_Cayman2009-2012       -21.0656      0.376    -56.016      0.000     -21.803     -20.328
    Model_CediaSports          1.898e-12   8.27e-13      2.296      0.022    2.77e-13    3.52e-12
    Model_CelerioCNG             -0.0125      0.224     -0.056      0.955      -0.451       0.426
    Model_CelerioLDi             -0.4440      0.222     -2.001      0.046      -0.879      -0.009
    Model_CelerioLXI             -0.1513      0.182     -0.831      0.406      -0.508       0.205
    Model_CelerioVXI             -0.1642      0.161     -1.020      0.308      -0.480       0.151
    Model_CelerioZDi             -0.1584      0.222     -0.713      0.476      -0.594       0.277
    Model_CelerioZXI             -0.0793      0.164     -0.484      0.629      -0.401       0.242
    Model_Ciaz1.3                 0.3287      0.173      1.901      0.057      -0.010       0.668
    Model_Ciaz1.4                 0.4380      0.177      2.481      0.013       0.092       0.784
    Model_CiazAT                  0.3220      0.193      1.670      0.095      -0.056       0.700
    Model_CiazAlpha            3.153e-12   9.59e-13      3.287      0.001    1.27e-12    5.03e-12
    Model_CiazRS                  0.5411      0.222      2.436      0.015       0.106       0.977
    Model_CiazVDI                 0.2534      0.182      1.393      0.164      -0.103       0.610
    Model_CiazVDi                 0.3388      0.169      2.007      0.045       0.008       0.670
    Model_CiazVXi                 0.3646      0.176      2.066      0.039       0.019       0.711
    Model_CiazZDi                 0.4153      0.164      2.534      0.011       0.094       0.737
    Model_CiazZXi                 0.4435      0.170      2.602      0.009       0.109       0.778
    Model_CiazZeta                0.4020      0.193      2.083      0.037       0.024       0.780
    Model_City1.3                -4.1017      0.095    -43.123      0.000      -4.288      -3.915
    Model_City1.5                -4.0157      0.065    -61.754      0.000      -4.143      -3.888
    Model_CityCorporate          -3.9912      0.166    -24.090      0.000      -4.316      -3.666
    Model_CityV                  -3.9466      0.076    -52.148      0.000      -4.095      -3.798
    Model_CityZX                 -4.1572      0.073    -56.656      0.000      -4.301      -4.013
    Model_Cityi                  -3.8646      0.069    -55.892      0.000      -4.000      -3.729
    Model_Cityi-DTEC             -3.6114      0.128    -28.140      0.000      -3.863      -3.360
    Model_Cityi-VTEC             -3.8434      0.078    -49.402      0.000      -3.996      -3.691
    Model_Civic2006-2010         -4.0615      0.073    -55.839      0.000      -4.204      -3.919
    Model_Civic2010-2013         -4.1479      0.086    -47.979      0.000      -4.317      -3.978
    Model_Classic1.4           2.994e-12   6.37e-13      4.702      0.000    1.75e-12    4.24e-12
    Model_ClassicNova          -198.4473      3.106    -63.893      0.000    -204.537    -192.358
    Model_ClubmanCooper         -24.5038      0.417    -58.772      0.000     -25.321     -23.686
    Model_Compass1.4          -1.347e-12   7.93e-13     -1.698      0.090    -2.9e-12    2.09e-13
    Model_Compass2.0            -98.8187      1.559    -63.395      0.000    -101.875     -95.762
    Model_ContinentalFlying     -97.9801      1.556    -62.987      0.000    -101.030     -94.930
    Model_Cooper3               -24.6235      0.395    -62.333      0.000     -25.398     -23.849
    Model_Cooper5               -24.7427      0.397    -62.305      0.000     -25.521     -23.964
    Model_CooperConvertible     -24.4751      0.396    -61.837      0.000     -25.251     -23.699
    Model_CooperCountryman      -24.7224      0.401    -61.583      0.000     -25.509     -23.935
    Model_CooperS               -24.3721      0.400    -60.933      0.000     -25.156     -23.588
    Model_Corolla1.8             -6.1820      0.184    -33.655      0.000      -6.542      -5.822
    Model_CorollaAltis           -6.3750      0.106    -59.873      0.000      -6.584      -6.166
    Model_CorollaDX              -6.7961      0.177    -38.370      0.000      -7.143      -6.449
    Model_CorollaExecutive       -6.6601      0.180    -37.061      0.000      -7.012      -6.308
    Model_CorollaH2              -7.0250      0.180    -39.073      0.000      -7.378      -6.673
    Model_CorollaH4              -6.7367      0.122    -55.083      0.000      -6.976      -6.497
    Model_CorollaH5              -6.8531      0.145    -47.104      0.000      -7.138      -6.568
    Model_CountrymanCooper      -24.9604      0.417    -59.853      0.000     -25.778     -24.143
    Model_Creta1.4               -3.0382      0.088    -34.400      0.000      -3.211      -2.865
    Model_Creta1.6               -3.0055      0.066    -45.418      0.000      -3.135      -2.876
    Model_CrossPolo1.5           -7.5774      0.158    -47.815      0.000      -7.888      -7.267
    Model_CruzeLTZ               -9.0739      0.161    -56.266      0.000      -9.390      -8.758
    Model_D-MAXV-Cross          -99.0671      1.560    -63.498      0.000    -102.126     -96.008
    Model_Duster110PS            -9.5441      0.161    -59.357      0.000      -9.859      -9.229
    Model_Duster85PS             -9.6185      0.161    -59.782      0.000      -9.934      -9.303
    Model_DusterAdventure        -9.6232      0.217    -44.358      0.000     -10.049      -9.198
    Model_DusterPetrol           -9.8249      0.219    -44.916      0.000     -10.254      -9.396
    Model_DusterRXZ            7.797e-13   7.49e-13      1.041      0.298   -6.89e-13    2.25e-12
    Model_DzireAMT                0.1008      0.177      0.570      0.569      -0.246       0.448
    Model_DzireLDI                0.0997      0.223      0.448      0.654      -0.337       0.536
    Model_DzireNew                0.3045      0.222      1.369      0.171      -0.132       0.741
    Model_DzireVDI                0.2813      0.176      1.595      0.111      -0.065       0.627
    Model_DzireVXI                0.2521      0.181      1.390      0.165      -0.104       0.608
    Model_DzireZDI                0.2884      0.193      1.494      0.135      -0.090       0.667
    Model_E-Class200           3.363e-12   1.14e-12      2.940      0.003    1.12e-12    5.61e-12
    Model_E-Class2009-2013       -5.6691      0.094    -60.194      0.000      -5.854      -5.484
    Model_E-Class2015-2017       -5.5356      0.103    -53.498      0.000      -5.738      -5.333
    Model_E-Class220           9.172e-13   8.03e-13      1.143      0.253   -6.57e-13    2.49e-12
    Model_E-Class230             -5.9963      0.124    -48.173      0.000      -6.240      -5.752
    Model_E-Class250             -5.7455      0.167    -34.356      0.000      -6.073      -5.418
    Model_E-Class280             -5.9938      0.109    -55.188      0.000      -6.207      -5.781
    Model_E-ClassE               -5.2590      0.141    -37.223      0.000      -5.536      -4.982
    Model_E-ClassE250            -5.6625      0.111    -51.194      0.000      -5.879      -5.446
    Model_E-ClassE270            -5.8159      0.171    -33.989      0.000      -6.151      -5.480
    Model_E-ClassE350            -5.4948      0.177    -31.026      0.000      -5.842      -5.148
    Model_E-ClassE400            -5.0602      0.178    -28.381      0.000      -5.410      -4.711
    Model_E-ClassFacelift        -5.5342      0.179    -30.920      0.000      -5.885      -5.183
    Model_EON1.0              -1.499e-12   8.93e-13     -1.680      0.093   -3.25e-12    2.51e-13
    Model_EOND                   -3.9973      0.075    -53.633      0.000      -4.143      -3.851
    Model_EONEra                 -4.0156      0.073    -55.176      0.000      -4.158      -3.873
    Model_EONLPG              -4.633e-13   7.84e-13     -0.591      0.555      -2e-12    1.07e-12
    Model_EONMagna               -4.0661      0.079    -51.394      0.000      -4.221      -3.911
    Model_EONSportz              -4.0561      0.108    -37.581      0.000      -4.268      -3.844
    Model_EcoSport1.0            -5.8898      0.133    -44.183      0.000      -6.151      -5.628
    Model_EcoSport1.5            -5.9411      0.103    -57.553      0.000      -6.143      -5.739
    Model_Ecosport1.0            -5.8605      0.180    -32.604      0.000      -6.213      -5.508
    Model_Ecosport1.5            -5.8902      0.101    -58.596      0.000      -6.087      -5.693
    Model_EcosportSignature      -5.9285      0.147    -40.462      0.000      -6.216      -5.641
    Model_Eeco5                  -0.3819      0.178     -2.144      0.032      -0.731      -0.033
    Model_Eeco7                  -0.3282      0.172     -1.909      0.056      -0.665       0.009
    Model_EecoCNG              1.696e-12   8.65e-13      1.960      0.050   -1.23e-16    3.39e-12
    Model_EecoSmiles          -7.436e-14   6.05e-13     -0.123      0.902   -1.26e-12    1.11e-12
    Model_Elantra1.6             -3.0195      0.124    -24.297      0.000      -3.263      -2.776
    Model_Elantra2.0             -2.7589      0.126    -21.923      0.000      -3.006      -2.512
    Model_ElantraCRDi            -3.0443      0.076    -40.255      0.000      -3.193      -2.896
    Model_ElantraSX              -2.9220      0.165    -17.670      0.000      -3.246      -2.598
    Model_Elitei20               -3.4032      0.078    -43.355      0.000      -3.557      -3.249
    Model_Endeavour2.2           -5.0802      0.134    -37.931      0.000      -5.343      -4.818
    Model_Endeavour2.5L          -5.5773      0.137    -40.693      0.000      -5.846      -5.309
    Model_Endeavour3.0L          -5.6725      0.128    -44.488      0.000      -5.922      -5.422
    Model_Endeavour3.2           -5.0355      0.123    -40.974      0.000      -5.276      -4.794
    Model_Endeavour4x2           -5.5870      0.154    -36.214      0.000      -5.890      -5.285
    Model_EndeavourHurricane     -5.7893      0.153    -37.919      0.000      -6.089      -5.490
    Model_EndeavourTitanium   -7.985e-14   4.78e-13     -0.167      0.867   -1.02e-12    8.58e-13
    Model_EndeavourXLT           -5.4897      0.149    -36.882      0.000      -5.782      -5.198
    Model_Enjoy1.3               -9.3697      0.192    -48.893      0.000      -9.745      -8.994
    Model_Enjoy1.4               -9.3457      0.218    -42.903      0.000      -9.773      -8.919
    Model_EnjoyPetrol            -9.4752      0.214    -44.216      0.000      -9.895      -9.055
    Model_EnjoyTCDi              -9.4791      0.180    -52.708      0.000      -9.832      -9.127
    Model_ErtigaLXI               0.4891      0.225      2.172      0.030       0.048       0.931
    Model_ErtigaPaseo             0.2952      0.226      1.307      0.191      -0.148       0.738
    Model_ErtigaSHVS              0.3766      0.173      2.174      0.030       0.037       0.716
    Model_ErtigaVDI               0.4251      0.167      2.546      0.011       0.098       0.753
    Model_ErtigaVXI               0.4107      0.172      2.390      0.017       0.074       0.747
    Model_ErtigaZDI               0.4537      0.167      2.715      0.007       0.126       0.781
    Model_ErtigaZXI               0.4604      0.186      2.479      0.013       0.096       0.825
    Model_EsteemLX               -0.4936      0.222     -2.227      0.026      -0.928      -0.059
    Model_EsteemVxi              -0.4946      0.175     -2.828      0.005      -0.838      -0.152
    Model_EstiloLXI              -0.1249      0.181     -0.691      0.489      -0.479       0.229
    Model_Etios1.4            -1.591e-12   7.08e-13     -2.246      0.025   -2.98e-12   -2.02e-13
    Model_EtiosCross             -6.8298      0.130    -52.610      0.000      -7.084      -6.575
    Model_EtiosG                 -6.9041      0.124    -55.793      0.000      -7.147      -6.662
    Model_EtiosGD                -6.8827      0.124    -55.525      0.000      -7.126      -6.640
    Model_EtiosLiva              -7.0123      0.111    -63.387      0.000      -7.229      -6.795
    Model_EtiosPetrol            -6.8233      0.185    -36.954      0.000      -7.185      -6.461
    Model_EtiosV                 -6.8643      0.184    -37.367      0.000      -7.225      -6.504
    Model_EtiosVD                -6.5880      0.135    -48.863      0.000      -6.852      -6.324
    Model_EtiosVX                -6.8650      0.149    -45.942      0.000      -7.158      -6.572
    Model_EtiosVXD            -2.356e-14   9.31e-13     -0.025      0.980   -1.85e-12     1.8e-12
    Model_Evalia2013            -13.4928      0.261    -51.715      0.000     -14.004     -12.981
    Model_FType                 -19.3359      0.349    -55.456      0.000     -20.020     -18.652
    Model_Fabia1.2               -6.4629      0.115    -56.273      0.000      -6.688      -6.238
    Model_Fabia1.2L              -6.3157      0.177    -35.664      0.000      -6.663      -5.969
    Model_Fabia1.4               -6.0552      0.177    -34.241      0.000      -6.402      -5.708
    Model_Fabia1.6               -6.5606      0.179    -36.584      0.000      -6.912      -6.209
    Model_Fiesta1.4              -6.3168      0.101    -62.706      0.000      -6.514      -6.119
    Model_Fiesta1.5              -6.0775      0.180    -33.728      0.000      -6.431      -5.724
    Model_Fiesta1.6              -6.4219      0.119    -53.896      0.000      -6.655      -6.188
    Model_FiestaClassic          -6.4737      0.118    -54.844      0.000      -6.705      -6.242
    Model_FiestaDiesel           -5.7724      0.178    -32.492      0.000      -6.121      -5.424
    Model_FiestaEXi              -6.3823      0.143    -44.766      0.000      -6.662      -6.103
    Model_FiestaTitanium       3.366e-13   7.26e-13      0.464      0.643   -1.09e-12    1.76e-12
    Model_Figo1.2P             3.353e-13      6e-13      0.558      0.577   -8.42e-13    1.51e-12
    Model_Figo1.5D               -6.0593      0.141    -42.902      0.000      -6.336      -5.782
    Model_Figo2015-2019          -6.3651      0.113    -56.397      0.000      -6.586      -6.144
    Model_FigoAspire             -6.2831      0.113    -55.662      0.000      -6.504      -6.062
    Model_FigoDiesel             -6.4529      0.100    -64.730      0.000      -6.648      -6.257
    Model_FigoPetrol             -6.3359      0.105    -60.412      0.000      -6.541      -6.130
    Model_FigoTitanium           -6.6700      0.181    -36.926      0.000      -7.024      -6.316
    Model_Fluence1.5             -9.9159      0.215    -46.065      0.000     -10.338      -9.494
    Model_Fluence2.0             -9.6142      0.215    -44.796      0.000     -10.035      -9.193
    Model_FluenceDiesel          -9.7868      0.187    -52.206      0.000     -10.154      -9.419
    Model_Fortuner2.8            -5.7408      0.124    -46.118      0.000      -5.985      -5.497
    Model_Fortuner3.0            -5.8331      0.116    -50.390      0.000      -6.060      -5.606
    Model_Fortuner4x2            -5.7433      0.116    -49.371      0.000      -5.971      -5.515
    Model_Fortuner4x4            -5.7713      0.127    -45.605      0.000      -6.019      -5.523
    Model_FortunerTRD            -5.8636      0.186    -31.511      0.000      -6.228      -5.499
    Model_FortwoCDI             -99.1499      1.556    -63.725      0.000    -102.200     -96.099
    Model_FreestyleTitanium      -5.6613      0.140    -40.353      0.000      -5.936      -5.386
    Model_FusionPlus             -6.0965      0.177    -34.478      0.000      -6.443      -5.750
    Model_GL-Class2007           -5.1038      0.118    -43.332      0.000      -5.335      -4.873
    Model_GL-Class350            -5.1340      0.149    -34.357      0.000      -5.427      -4.841
    Model_GLAClass               -5.7046      0.104    -54.912      0.000      -5.908      -5.501
    Model_GLC220                 -5.4271      0.141    -38.385      0.000      -5.704      -5.150
    Model_GLC220d                -5.3741      0.141    -38.093      0.000      -5.651      -5.098
    Model_GLC43                  -5.3052      0.182    -29.170      0.000      -5.662      -4.949
    Model_GLE250d                -5.1583      0.119    -43.309      0.000      -5.392      -4.925
    Model_GLE350d                -5.2360      0.113    -46.495      0.000      -5.457      -5.015
    Model_GLS350d                -5.0848      0.151    -33.597      0.000      -5.382      -4.788
    Model_GONXT                 -28.3479      0.464    -61.029      0.000     -29.259     -27.437
    Model_GOPlus                -28.2127      0.461    -61.159      0.000     -29.117     -27.308
    Model_GOT                   -28.5319      0.466    -61.167      0.000     -29.446     -27.617
    Model_GallardoCoupe         -97.7633      1.557    -62.777      0.000    -100.817     -94.710
    Model_Getz1.3              4.582e-13   7.91e-13      0.579      0.562   -1.09e-12    2.01e-12
    Model_Getz1.5                -3.9572      0.163    -24.223      0.000      -4.278      -3.637
    Model_GetzGLE                -3.7616      0.102    -36.991      0.000      -3.961      -3.562
    Model_GetzGLS                -4.2214      0.092    -45.985      0.000      -4.401      -4.041
    Model_GetzGVS                -3.3482      0.162    -20.704      0.000      -3.665      -3.031
    Model_GrandVitara             0.5830      0.197      2.953      0.003       0.196       0.970
    Model_GrandePunto           -15.2656      0.257    -59.464      0.000     -15.769     -14.762
    Model_Grandi10               -3.6844      0.063    -58.926      0.000      -3.807      -3.562
    Model_HexaXT                 -3.7729      0.143    -26.309      0.000      -4.054      -3.492
    Model_HexaXTA                -3.7857      0.178    -21.261      0.000      -4.135      -3.437
    Model_Ignis1.2               -0.2619      0.193     -1.358      0.175      -0.640       0.116
    Model_Ignis1.3                0.0508      0.222      0.228      0.819      -0.385       0.487
    Model_Ikon1.3                -6.4491      0.110    -58.648      0.000      -6.665      -6.234
    Model_Ikon1.4                -7.2269      0.179    -40.318      0.000      -7.578      -6.875
    Model_Ikon1.6                -6.6108      0.173    -38.118      0.000      -6.951      -6.271
    Model_IndicaDLS              -5.3396      0.116    -46.008      0.000      -5.567      -5.112
    Model_IndicaGLS              -5.0438      0.169    -29.839      0.000      -5.375      -4.712
    Model_IndicaLEI              -4.9374      0.168    -29.340      0.000      -5.267      -4.607
    Model_IndicaV2               -5.2222      0.092    -57.000      0.000      -5.402      -5.043
    Model_IndicaVista            -4.9346      0.092    -53.536      0.000      -5.115      -4.754
    Model_IndigoCS               -4.9044      0.097    -50.510      0.000      -5.095      -4.714
    Model_IndigoGLE              -5.1794      0.133    -38.836      0.000      -5.441      -4.918
    Model_IndigoLS               -4.9978      0.106    -47.154      0.000      -5.206      -4.790
    Model_IndigoLX               -5.0369      0.109    -46.385      0.000      -5.250      -4.824
    Model_IndigoXL             -2.02e-12   9.41e-13     -2.147      0.032   -3.86e-12   -1.75e-13
    Model_IndigoeCS              -4.9951      0.105    -47.763      0.000      -5.200      -4.790
    Model_Innova2.0              -6.0566      0.134    -45.055      0.000      -6.320      -5.793
    Model_Innova2.5              -6.1405      0.114    -53.762      0.000      -6.364      -5.917
    Model_InnovaCrysta           -6.1272      0.118    -51.854      0.000      -6.359      -5.896
    Model_Jazz1.2                -4.1526      0.077    -54.278      0.000      -4.303      -4.003
    Model_Jazz1.5                -4.0671      0.085    -48.050      0.000      -4.233      -3.901
    Model_JazzActive             -4.3858      0.165    -26.652      0.000      -4.708      -4.063
    Model_JazzExclusive          -4.1900      0.167    -25.148      0.000      -4.517      -3.863
    Model_JazzMode               -4.3447      0.165    -26.382      0.000      -4.668      -4.022
    Model_JazzS                  -4.2829      0.124    -34.425      0.000      -4.527      -4.039
    Model_JazzSelect             -4.2641      0.125    -34.169      0.000      -4.509      -4.019
    Model_JazzV                  -4.0543      0.096    -42.359      0.000      -4.242      -3.867
    Model_JazzVX                 -4.1028      0.102    -40.267      0.000      -4.303      -3.903
    Model_JeepMM                 -4.2671      0.128    -33.320      0.000      -4.518      -4.016
    Model_Jetta2007-2011         -7.1305      0.128    -55.491      0.000      -7.382      -6.879
    Model_Jetta2012-2014         -6.9819      0.135    -51.810      0.000      -7.246      -6.718
    Model_Jetta2013-2015         -6.9901      0.130    -53.585      0.000      -7.246      -6.734
    Model_KUV100                 -4.8161      0.093    -51.601      0.000      -4.999      -4.633
    Model_KWID1.0               -10.3027      0.172    -59.733      0.000     -10.641      -9.964
    Model_KWIDAMT               -10.5141      0.217    -48.431      0.000     -10.940     -10.088
    Model_KWIDClimber           -10.3249      0.175    -58.880      0.000     -10.669      -9.981
    Model_KWIDRXL               -10.6211      0.188    -56.634      0.000     -10.989     -10.253
    Model_KWIDRXT               -10.3963      0.161    -64.647      0.000     -10.712     -10.081
    Model_Koleos2.0              -9.2702      0.181    -51.192      0.000      -9.625      -8.915
    Model_Lancer1.5             -25.4607      0.404    -63.075      0.000     -26.252     -24.669
    Model_LancerGLXD            -24.6319      0.407    -60.487      0.000     -25.430     -23.833
    Model_Laura1.8               -5.9655      0.129    -46.238      0.000      -6.218      -5.713
    Model_Laura1.9               -5.7571      0.120    -48.087      0.000      -5.992      -5.522
    Model_LauraAmbiente          -5.7958      0.115    -50.222      0.000      -6.022      -5.570
    Model_LauraAmbition          -5.7674      0.128    -44.972      0.000      -6.019      -5.516
    Model_LauraClassic           -6.0986      0.178    -34.311      0.000      -6.447      -5.750
    Model_LauraElegance          -5.8082      0.127    -45.618      0.000      -6.058      -5.559
    Model_LauraL                 -6.0911      0.141    -43.049      0.000      -6.369      -5.814
    Model_LauraRS                -5.6934      0.180    -31.668      0.000      -6.046      -5.341
    Model_Linea1.3              -14.9663      0.284    -52.623      0.000     -15.524     -14.409
    Model_LineaClassic          -15.3956      0.284    -54.134      0.000     -15.953     -14.838
    Model_LineaEmotion          -15.0820      0.257    -58.573      0.000     -15.587     -14.577
    Model_LineaT                -15.4314      0.282    -54.735      0.000     -15.984     -14.879
    Model_LineaT-Jet            -14.9673      0.285    -52.565      0.000     -15.526     -14.409
    Model_Lodgy110PS             -9.6851      0.201    -48.158      0.000     -10.079      -9.291
    Model_LoganDiesel            -4.9742      0.173    -28.832      0.000      -5.312      -4.636
    Model_LoganPetrol            -5.1021      0.170    -29.953      0.000      -5.436      -4.768
    Model_M-ClassML              -5.4614      0.098    -55.736      0.000      -5.654      -5.269
    Model_MUX4WD              -3.281e-12   8.43e-13     -3.889      0.000   -4.93e-12   -1.63e-12
    Model_ManzaAqua              -4.9587      0.116    -42.575      0.000      -5.187      -4.730
    Model_ManzaAura              -4.9780      0.108    -46.159      0.000      -5.189      -4.767
    Model_ManzaClub              -4.8096      0.171    -28.121      0.000      -5.145      -4.474
    Model_ManzaELAN              -4.5108      0.133    -33.832      0.000      -4.772      -4.249
    Model_MicraActive           -13.5738      0.223    -60.756      0.000     -14.012     -13.136
    Model_MicraDiesel           -13.4818      0.212    -63.576      0.000     -13.898     -13.066
    Model_MicraXE               -13.3985      0.252    -53.151      0.000     -13.893     -12.904
    Model_MicraXL               -13.6417      0.223    -61.171      0.000     -14.079     -13.204
    Model_MicraXV               -13.3898      0.216    -62.032      0.000     -13.813     -12.967
    Model_MobilioE               -4.1256      0.171    -24.182      0.000      -4.460      -3.791
    Model_MobilioRS              -4.0275      0.132    -30.420      0.000      -4.287      -3.768
    Model_MobilioS               -4.1556      0.103    -40.275      0.000      -4.358      -3.953
    Model_MobilioV               -3.9936      0.117    -34.095      0.000      -4.223      -3.764
    Model_Montero3.2            -24.4339      0.416    -58.706      0.000     -25.250     -23.618
    Model_MustangV8              -4.6117      0.196    -23.492      0.000      -4.997      -4.227
    Model_NanoCX                 -5.5041      0.172    -32.001      0.000      -5.841      -5.167
    Model_NanoCx                 -5.7012      0.133    -42.862      0.000      -5.962      -5.440
    Model_NanoLX                 -5.5080      0.132    -41.629      0.000      -5.767      -5.249
    Model_NanoLx                 -5.6141      0.117    -48.139      0.000      -5.843      -5.385
    Model_NanoSTD                -6.0777      0.173    -35.202      0.000      -6.416      -5.739
    Model_NanoTwist              -5.4198      0.102    -53.096      0.000      -5.620      -5.220
    Model_NanoXT                 -5.3621      0.134    -39.927      0.000      -5.625      -5.099
    Model_NanoXTA                -5.1968      0.103    -50.364      0.000      -5.399      -4.994
    Model_NewC-Class             -5.7923      0.091    -63.940      0.000      -5.970      -5.615
    Model_NewSafari              -4.4754      0.105    -42.567      0.000      -4.682      -4.269
    Model_Nexon1.2             2.531e-12   9.72e-13      2.603      0.009    6.25e-13    4.44e-12
    Model_Nexon1.5               -4.1240      0.173    -23.884      0.000      -4.463      -3.785
    Model_NuvoSportN6            -4.7146      0.171    -27.545      0.000      -5.050      -4.379
    Model_NuvoSportN8         -2.615e-12   8.19e-13     -3.195      0.001   -4.22e-12   -1.01e-12
    Model_Octavia1.9             -6.2216      0.177    -35.204      0.000      -6.568      -5.875
    Model_Octavia2.0             -5.2702      0.145    -36.472      0.000      -5.554      -4.987
    Model_OctaviaAmbiente        -6.0928      0.125    -48.593      0.000      -6.339      -5.847
    Model_OctaviaAmbition        -5.3890      0.124    -43.577      0.000      -5.631      -5.147
    Model_OctaviaClassic         -6.0666      0.138    -44.022      0.000      -6.337      -5.796
    Model_OctaviaElegance        -5.3664      0.112    -48.092      0.000      -5.585      -5.148
    Model_OctaviaL               -6.0409      0.176    -34.270      0.000      -6.387      -5.695
    Model_OctaviaRS              -6.2112      0.176    -35.205      0.000      -6.557      -5.865
    Model_OctaviaRider           -6.1500      0.140    -43.943      0.000      -6.424      -5.876
    Model_OctaviaStyle        -2.409e-12   5.73e-13     -4.206      0.000   -3.53e-12   -1.29e-12
    Model_Omni5                  -0.4261      0.193     -2.212      0.027      -0.804      -0.048
    Model_Omni8                  -0.5503      0.173     -3.187      0.001      -0.889      -0.212
    Model_OmniE                  -0.5724      0.186     -3.078      0.002      -0.937      -0.208
    Model_OmniMPI                -0.5397      0.182     -2.965      0.003      -0.897      -0.183
    Model_OneLX                 -98.8717      1.560    -63.395      0.000    -101.930     -95.814
    Model_Optra1.6               -9.3441      0.181    -51.752      0.000      -9.698      -8.990
    Model_OptraMagnum            -9.4623      0.158    -59.936      0.000      -9.772      -9.153
    Model_Outlander2.4          -24.7546      0.402    -61.557      0.000     -25.543     -23.966
    Model_Pajero2.8             -24.5173      0.397    -61.800      0.000     -25.295     -23.739
    Model_Pajero4X4             -24.7015      0.422    -58.476      0.000     -25.530     -23.873
    Model_PajeroSport           -24.5237      0.404    -60.764      0.000     -25.315     -23.732
    Model_Panamera2010          -21.0899      0.378    -55.822      0.000     -21.831     -20.349
    Model_PanameraDiesel        -21.1472      0.354    -59.682      0.000     -21.842     -20.453
    Model_Passat1.8              -7.2679      0.191    -38.087      0.000      -7.642      -6.894
    Model_Passat2.0            1.777e-12   9.31e-13      1.910      0.056   -4.75e-14     3.6e-12
    Model_PassatDiesel           -6.9680      0.137    -50.840      0.000      -7.237      -6.699
    Model_PassatHighline         -6.9587      0.190    -36.709      0.000      -7.330      -6.587
    Model_Petra1.2              -15.5801      0.275    -56.573      0.000     -16.120     -15.040
    Model_PlatinumEtios       -1.697e-12   7.22e-13     -2.352      0.019   -3.11e-12   -2.82e-13
    Model_Polo1.0                -7.4980      0.193    -38.864      0.000      -7.876      -7.120
    Model_Polo1.2                -7.5363      0.121    -62.142      0.000      -7.774      -7.299
    Model_Polo1.5                -7.5771      0.124    -61.227      0.000      -7.820      -7.334
    Model_PoloDiesel             -7.5681      0.120    -63.329      0.000      -7.802      -7.334
    Model_PoloGT                 -7.4551      0.131    -56.957      0.000      -7.712      -7.199
    Model_PoloGTI                -7.6520      0.160    -47.958      0.000      -7.965      -7.339
    Model_PoloIPL             -8.871e-13   6.45e-13     -1.375      0.169   -2.15e-12    3.78e-13
    Model_PoloPetrol             -7.5474      0.119    -63.644      0.000      -7.780      -7.315
    Model_PulsePetrol            -9.9911      0.218    -45.832      0.000     -10.419      -9.564
    Model_PulseRxL              -10.0897      0.179    -56.406      0.000     -10.440      -9.739
    Model_Punto1.2              -15.4665      0.285    -54.319      0.000     -16.025     -14.908
    Model_Punto1.3              -15.2772      0.279    -54.698      0.000     -15.825     -14.730
    Model_Punto1.4              -15.4527      0.280    -55.182      0.000     -16.002     -14.904
    Model_PuntoEVO             6.954e-13   9.99e-13      0.696      0.486   -1.26e-12    2.65e-12
    Model_Q32.0                  -6.7812      0.126    -53.993      0.000      -7.027      -6.535
    Model_Q32012-2015            -6.7844      0.123    -55.067      0.000      -7.026      -6.543
    Model_Q330                -4.369e-12   1.06e-12     -4.103      0.000   -6.46e-12   -2.28e-12
    Model_Q335                   -6.7832      0.127    -53.427      0.000      -7.032      -6.534
    Model_Q52.0                  -6.4911      0.118    -55.049      0.000      -6.722      -6.260
    Model_Q52008-2012            -6.5431      0.123    -53.283      0.000      -6.784      -6.302
    Model_Q53.0                  -6.5681      0.152    -43.231      0.000      -6.866      -6.270
    Model_Q530                   -6.3940      0.124    -51.564      0.000      -6.637      -6.151
    Model_Q73.0                  -6.4069      0.122    -52.635      0.000      -6.646      -6.168
    Model_Q735                   -6.3184      0.133    -47.499      0.000      -6.579      -6.058
    Model_Q74.2                  -6.4492      0.140    -46.133      0.000      -6.723      -6.175
    Model_Q745                   -6.0710      0.141    -43.205      0.000      -6.346      -5.795
    Model_QualisFS               -6.1899      0.198    -31.232      0.000      -6.578      -5.801
    Model_QualisFleet            -6.2484      0.202    -30.868      0.000      -6.645      -5.852
    Model_QualisRS               -6.1888      0.202    -30.568      0.000      -6.586      -5.792
    Model_QuantoC2               -4.9564      0.170    -29.237      0.000      -5.289      -4.624
    Model_QuantoC4               -4.7584      0.168    -28.282      0.000      -5.088      -4.429
    Model_QuantoC6               -4.7912      0.168    -28.493      0.000      -5.121      -4.461
    Model_QuantoC8               -4.7291      0.129    -36.567      0.000      -4.983      -4.476
    Model_R-ClassR350            -5.5495      0.133    -41.697      0.000      -5.810      -5.289
    Model_RS5Coupe               -6.2802      0.164    -38.253      0.000      -6.602      -5.958
    Model_Rapid1.5               -5.9734      0.106    -56.206      0.000      -6.182      -5.765
    Model_Rapid1.6               -6.0211      0.103    -58.223      0.000      -6.224      -5.818
    Model_Rapid2013-2016       9.712e-13   8.39e-13      1.158      0.247   -6.74e-13    2.62e-12
    Model_RapidLeisure         1.033e-12   9.85e-13      1.048      0.295   -8.99e-13    2.96e-12
    Model_RapidUltima            -6.3048      0.180    -35.032      0.000      -6.658      -5.952
    Model_RediGO                -28.6261      0.466    -61.479      0.000     -29.539     -27.713
    Model_RenaultLogan           -4.6565      0.167    -27.842      0.000      -4.984      -4.329
    Model_RitzAT              -2.691e-13   5.82e-13     -0.463      0.644   -1.41e-12    8.71e-13
    Model_RitzLDi                -0.0513      0.172     -0.297      0.766      -0.389       0.287
    Model_RitzLXI                 0.0873      0.221      0.394      0.693      -0.347       0.521
    Model_RitzLXi                -0.0871      0.192     -0.453      0.650      -0.464       0.290
    Model_RitzVDI                -0.0707      0.222     -0.319      0.750      -0.506       0.364
    Model_RitzVDi                -0.0407      0.161     -0.252      0.801      -0.357       0.275
    Model_RitzVXI                -0.0744      0.176     -0.424      0.672      -0.419       0.270
    Model_RitzVXi                -0.0806      0.170     -0.475      0.635      -0.413       0.252
    Model_RitzZDi                -0.0922      0.192     -0.479      0.632      -0.469       0.285
    Model_RitzZXI              8.212e-13   5.89e-13      1.395      0.163   -3.33e-13    1.98e-12
    Model_RitzZXi                 0.0438      0.222      0.198      0.843      -0.391       0.478
    Model_RoverDiscovery        -49.2817      0.784    -62.827      0.000     -50.820     -47.744
    Model_RoverFreelander       -49.4741      0.779    -63.538      0.000     -51.001     -47.947
    Model_RoverRange            -48.8920      0.778    -62.862      0.000     -50.417     -47.367
    Model_S-Class280          -2.875e-12   9.05e-13     -3.176      0.002   -4.65e-12    -1.1e-12
    Model_S-Class320             -5.2182      0.172    -30.265      0.000      -5.556      -4.880
    Model_S-ClassS               -5.3763      0.176    -30.618      0.000      -5.721      -5.032
    Model_S-CrossAlpha        -8.134e-13   9.46e-13     -0.860      0.390   -2.67e-12    1.04e-12
    Model_S-CrossDelta            0.3236      0.222      1.456      0.145      -0.112       0.759
    Model_S-CrossZeta          7.269e-13   7.45e-13      0.975      0.330   -7.35e-13    2.19e-12
    Model_S60D3                3.334e-13   6.58e-13      0.506      0.613   -9.57e-13    1.62e-12
    Model_S60D4                 -21.9055      0.362    -60.454      0.000     -22.616     -21.195
    Model_S60D5                 -21.9008      0.374    -58.601      0.000     -22.634     -21.168
    Model_S802006-2013          -22.4719      0.372    -60.402      0.000     -23.201     -21.742
    Model_S80D5               -2.341e-12   7.82e-13     -2.993      0.003   -3.88e-12   -8.07e-13
    Model_SClass                 -5.4716      0.102    -53.520      0.000      -5.672      -5.271
    Model_SCross                  0.4283      0.173      2.479      0.013       0.090       0.767
    Model_SL-ClassSL             -5.0114      0.190    -26.389      0.000      -5.384      -4.639
    Model_SLC43                  -5.2052      0.153    -34.066      0.000      -5.505      -4.906
    Model_SLK-Class55            -4.8603      0.191    -25.506      0.000      -5.234      -4.487
    Model_SLK-ClassSLK           -5.2134      0.149    -35.046      0.000      -5.505      -4.922
    Model_SX4Green               -0.1877      0.224     -0.838      0.402      -0.627       0.252
    Model_SX4S                    0.4418      0.171      2.591      0.010       0.107       0.776
    Model_SX4VDI                  0.1712      0.222      0.772      0.440      -0.264       0.606
    Model_SX4Vxi                  0.0339      0.167      0.203      0.839      -0.293       0.361
    Model_SX4ZDI                  0.1272      0.176      0.723      0.469      -0.217       0.472
    Model_SX4ZXI                  0.1013      0.170      0.596      0.551      -0.232       0.435
    Model_SX4Zxi                  0.0269      0.176      0.153      0.878      -0.317       0.371
    Model_SafariDICOR            -4.4303      0.176    -25.135      0.000      -4.776      -4.085
    Model_SafariStorme           -4.0820      0.113    -36.283      0.000      -4.303      -3.861
    Model_Sail1.2                -9.2441      0.183    -50.444      0.000      -9.603      -8.885
    Model_SailHatchback          -9.5374      0.168    -56.691      0.000      -9.867      -9.208
    Model_SailLT                 -9.6807      0.212    -45.563      0.000     -10.097      -9.264
    Model_SantaFe                -2.8409      0.089    -31.743      0.000      -3.016      -2.665
    Model_SantroAT               -3.6891      0.164    -22.455      0.000      -4.011      -3.367
    Model_SantroD                -3.9024      0.161    -24.238      0.000      -4.218      -3.587
    Model_SantroDX             4.293e-13   9.89e-13      0.434      0.664   -1.51e-12    2.37e-12
    Model_SantroGLS              -3.9380      0.095    -41.662      0.000      -4.123      -3.753
    Model_SantroGS               -3.6835      0.123    -29.847      0.000      -3.925      -3.441
    Model_SantroLP               -3.5956      0.162    -22.174      0.000      -3.913      -3.278
    Model_SantroLS               -4.2427      0.164    -25.836      0.000      -4.565      -3.921
    Model_SantroXing             -3.9233      0.059    -66.173      0.000      -4.039      -3.807
    Model_ScalaDiesel            -9.7957      0.218    -44.930      0.000     -10.223      -9.368
    Model_ScalaRxL              -10.1425      0.187    -54.335      0.000     -10.508      -9.776
    Model_Scorpio1.99            -4.0953      0.132    -31.114      0.000      -4.353      -3.837
    Model_Scorpio2.6             -4.3486      0.098    -44.514      0.000      -4.540      -4.157
    Model_Scorpio2009-2014       -4.1909      0.105    -39.730      0.000      -4.398      -3.984
    Model_ScorpioDX              -4.1549      0.167    -24.817      0.000      -4.483      -3.827
    Model_ScorpioLX              -4.3753      0.129    -33.908      0.000      -4.628      -4.122
    Model_ScorpioS10             -3.9778      0.116    -34.373      0.000      -4.205      -3.751
    Model_ScorpioS2           -1.983e-12   7.01e-13     -2.827      0.005   -3.36e-12   -6.08e-13
    Model_ScorpioS4              -4.2234      0.131    -32.255      0.000      -4.480      -3.967
    Model_ScorpioS6              -3.9975      0.116    -34.378      0.000      -4.226      -3.770
    Model_ScorpioS8              -4.0309      0.133    -30.228      0.000      -4.292      -3.769
    Model_ScorpioSLE             -4.2456      0.099    -42.817      0.000      -4.440      -4.051
    Model_ScorpioSLX           7.218e-13   6.18e-13      1.167      0.243   -4.91e-13    1.93e-12
    Model_ScorpioVLX             -4.1591      0.086    -48.565      0.000      -4.327      -3.991
    Model_Siena1.2              -15.6515      0.276    -56.758      0.000     -16.192     -15.111
    Model_Sonata2.4           -3.116e-13   9.35e-13     -0.333      0.739   -2.15e-12    1.52e-12
    Model_SonataEmbera           -3.0664      0.124    -24.821      0.000      -3.309      -2.824
    Model_SonataGOLD             -3.6291      0.161    -22.552      0.000      -3.945      -3.314
    Model_SonataTransform       1.18e-12   5.49e-13      2.151      0.032    1.04e-13    2.26e-12
    Model_Spark1.0               -9.6900      0.169    -57.406      0.000     -10.021      -9.359
    Model_SsangyongRexton        -3.9533      0.091    -43.240      0.000      -4.133      -3.774
    Model_SumoDX                 -4.2895      0.197    -21.759      0.000      -4.676      -3.903
    Model_SumoDelux              -4.2354      0.175    -24.177      0.000      -4.579      -3.892
    Model_SumoEX                 -4.6838      0.182    -25.702      0.000      -5.041      -4.327
    Model_Sunny2011-2014        -13.3117      0.211    -63.134      0.000     -13.725     -12.898
    Model_SunnyDiesel           -13.1698      0.255    -51.587      0.000     -13.670     -12.669
    Model_SunnyXE             -4.512e-13   8.05e-13     -0.560      0.575   -2.03e-12    1.13e-12
    Model_SunnyXL               -12.9906      0.254    -51.168      0.000     -13.488     -12.493
    Model_SunnyXV               -13.1788      0.231    -57.144      0.000     -13.631     -12.727
    Model_Superb1.8              -5.6058      0.122    -45.888      0.000      -5.845      -5.366
    Model_Superb2.5           -4.392e-13   5.63e-13     -0.780      0.435   -1.54e-12    6.65e-13
    Model_Superb2.8              -5.6891      0.138    -41.244      0.000      -5.960      -5.419
    Model_Superb2009-2014        -4.9904      0.182    -27.432      0.000      -5.347      -4.634
    Model_Superb3.6            2.376e-12   8.27e-13      2.873      0.004    7.54e-13       4e-12
    Model_SuperbAmbition         -5.5036      0.178    -30.850      0.000      -5.853      -5.154
    Model_SuperbElegance         -5.4992      0.102    -53.887      0.000      -5.699      -5.299
    Model_SuperbL&K              -5.0246      0.148    -33.986      0.000      -5.315      -4.735
    Model_SuperbStyle            -5.4082      0.124    -43.545      0.000      -5.652      -5.165
    Model_Swift1.3                0.1239      0.167      0.741      0.459      -0.204       0.452
    Model_SwiftAMT                0.0855      0.193      0.442      0.659      -0.294       0.465
    Model_SwiftDDiS               0.1151      0.182      0.633      0.527      -0.242       0.472
    Model_SwiftDzire              0.1869      0.158      1.180      0.238      -0.124       0.498
    Model_SwiftLDI                0.0474      0.171      0.278      0.781      -0.287       0.382
    Model_SwiftLXI                0.2370      0.181      1.309      0.191      -0.118       0.592
    Model_SwiftLXi                0.0941      0.221      0.426      0.670      -0.339       0.527
    Model_SwiftLdi                0.1085      0.173      0.628      0.530      -0.230       0.447
    Model_SwiftLxi               -0.1151      0.176     -0.656      0.512      -0.459       0.229
    Model_SwiftRS                 0.1850      0.222      0.835      0.404      -0.249       0.619
    Model_SwiftVDI                0.1551      0.159      0.975      0.330      -0.157       0.467
    Model_SwiftVDi            -2.104e-12   5.37e-13     -3.918      0.000   -3.16e-12   -1.05e-12
    Model_SwiftVVT                0.1283      0.193      0.666      0.506      -0.249       0.506
    Model_SwiftVXI                0.0927      0.161      0.577      0.564      -0.223       0.408
    Model_SwiftVXi                0.0176      0.182      0.097      0.923      -0.338       0.374
    Model_SwiftVdi                0.1839      0.176      1.047      0.295      -0.160       0.528
    Model_SwiftZDI                0.1386      0.222      0.625      0.532      -0.296       0.574
    Model_SwiftZDi                0.2342      0.166      1.410      0.159      -0.092       0.560
    Model_SwiftZXI                0.2081      0.172      1.208      0.227      -0.130       0.546
    Model_TT2.0                  -6.4847      0.190    -34.170      0.000      -6.857      -6.113
    Model_TT40                   -5.9077      0.182    -32.539      0.000      -6.264      -5.552
    Model_TUV300                 -4.5025      0.098    -46.041      0.000      -4.694      -4.311
    Model_TaveraLS               -9.3899      0.236    -39.789      0.000      -9.853      -8.927
    Model_TaveraLT               -8.8982      0.226    -39.349      0.000      -9.342      -8.455
    Model_Teana230jM          -1.144e-12   1.03e-12     -1.113      0.266   -3.16e-12    8.71e-13
    Model_TeanaXV               -12.8632      0.262    -49.188      0.000     -13.376     -12.351
    Model_TerranoXL             -13.0631      0.214    -60.928      0.000     -13.483     -12.643
    Model_TerranoXV             -13.0035      0.216    -60.286      0.000     -13.426     -12.581
    Model_TharCRDe               -4.2702      0.116    -36.950      0.000      -4.497      -4.044
    Model_TharDI                 -4.5283      0.172    -26.377      0.000      -4.865      -4.192
    Model_Tiago1.2               -4.7574      0.102    -46.745      0.000      -4.957      -4.558
    Model_TiagoAMT             1.262e-12   5.72e-13      2.205      0.027     1.4e-13    2.38e-12
    Model_TiagoWizz            3.105e-12   6.59e-13      4.714      0.000    1.81e-12     4.4e-12
    Model_Tigor1.05              -4.4467      0.173    -25.706      0.000      -4.786      -4.108
    Model_Tigor1.2               -4.6836      0.134    -34.861      0.000      -4.947      -4.420
    Model_TigorXE             -1.755e-12   1.03e-12     -1.698      0.090   -3.78e-12    2.72e-13
    Model_Tiguan2.0              -6.3560      0.193    -32.863      0.000      -6.735      -5.977
    Model_Tucson2.0              -2.5898      0.166    -15.626      0.000      -2.915      -2.265
    Model_TucsonCRDi             -2.7089      0.170    -15.939      0.000      -3.042      -2.376
    Model_V40Cross              -21.8287      0.372    -58.666      0.000     -22.558     -21.099
    Model_V40D3                 -21.9063      0.364    -60.262      0.000     -22.619     -21.194
    Model_Vento1.2             5.416e-13    7.5e-13      0.722      0.470   -9.29e-13    2.01e-12
    Model_Vento1.5               -7.3840      0.122    -60.486      0.000      -7.623      -7.145
    Model_Vento1.6               -7.4029      0.128    -57.851      0.000      -7.654      -7.152
    Model_Vento2013-2015         -7.4946      0.159    -47.015      0.000      -7.807      -7.182
    Model_VentoDiesel            -7.4081      0.119    -62.234      0.000      -7.641      -7.175
    Model_VentoIPL               -7.6005      0.156    -48.819      0.000      -7.906      -7.295
    Model_VentoKonekt            -7.2200      0.190    -37.930      0.000      -7.593      -6.847
    Model_VentoMagnific          -7.3943      0.190    -38.885      0.000      -7.767      -7.021
    Model_VentoPetrol            -7.4415      0.120    -62.139      0.000      -7.676      -7.207
    Model_VentoSport             -7.3029      0.158    -46.343      0.000      -7.612      -6.994
    Model_VentoTSI             2.329e-12   8.15e-13      2.859      0.004    7.32e-13    3.93e-12
    Model_VentureEX              -4.7516      0.180    -26.436      0.000      -5.104      -4.399
    Model_Verito1.5              -4.7821      0.118    -40.456      0.000      -5.014      -4.550
    Model_Verna1.4               -3.3457      0.087    -38.623      0.000      -3.515      -3.176
    Model_Verna1.6               -3.3032      0.062    -52.940      0.000      -3.426      -3.181
    Model_VernaCRDi              -3.4769      0.069    -50.095      0.000      -3.613      -3.341
    Model_VernaSX                -3.3096      0.083    -39.665      0.000      -3.473      -3.146
    Model_VernaTransform         -3.6204      0.081    -44.424      0.000      -3.780      -3.461
    Model_VernaVTVT              -3.2389      0.079    -40.793      0.000      -3.395      -3.083
    Model_VernaXXi               -3.6599      0.162    -22.549      0.000      -3.978      -3.342
    Model_VernaXi                -3.8194      0.163    -23.415      0.000      -4.139      -3.500
    Model_VersaDX2                0.0337      0.228      0.148      0.883      -0.414       0.482
    Model_VitaraBrezza            0.3896      0.161      2.423      0.015       0.074       0.705
    Model_WR-VEdge               -4.1562      0.168    -24.778      0.000      -4.485      -3.827
    Model_WRVi-VTEC              -3.9613      0.129    -30.816      0.000      -4.213      -3.709
    Model_WagonR                 -0.1470      0.158     -0.930      0.352      -0.457       0.163
    Model_X-TrailSLX            -12.5613      0.233    -53.813      0.000     -13.019     -12.104
    Model_X1M                   -10.0304      0.190    -52.674      0.000     -10.404      -9.657
    Model_X1sDrive              -10.2879      0.164    -62.667      0.000     -10.610      -9.966
    Model_X1sDrive20d           -10.3399      0.166    -62.214      0.000     -10.666     -10.014
    Model_X1xDrive               -9.9258      0.218    -45.568      0.000     -10.353      -9.499
    Model_X3xDrive               -9.8674      0.177    -55.888      0.000     -10.214      -9.521
    Model_X3xDrive20d           -10.0053      0.171    -58.548      0.000     -10.340      -9.670
    Model_X3xDrive30d            -9.9052      0.218    -45.362      0.000     -10.333      -9.477
    Model_X52014-2019            -9.6152      0.183    -52.491      0.000      -9.974      -9.256
    Model_X53.0d                 -9.8939      0.177    -55.899      0.000     -10.241      -9.547
    Model_X5X5                   -9.4532      0.191    -49.367      0.000      -9.829      -9.078
    Model_X5xDrive               -9.6457      0.168    -57.344      0.000      -9.975      -9.316
    Model_X6xDrive               -9.3552      0.182    -51.413      0.000      -9.712      -8.998
    Model_X6xDrive30d            -9.5430      0.189    -50.375      0.000      -9.914      -9.172
    Model_XC60D4                -21.8700      0.362    -60.497      0.000     -22.579     -21.161
    Model_XC60D5                -21.8938      0.356    -61.539      0.000     -22.591     -21.196
    Model_XC902007-2015         -21.6101      0.381    -56.792      0.000     -22.356     -20.864
    Model_XE2.0L              -6.131e-13   4.43e-13     -1.384      0.166   -1.48e-12    2.56e-13
    Model_XEPortfolio         -4.061e-13   2.33e-13     -1.743      0.081   -8.63e-13    5.06e-14
    Model_XF2.0                 -19.7848      0.346    -57.132      0.000     -20.464     -19.106
    Model_XF2.2                 -19.8963      0.318    -62.619      0.000     -20.519     -19.273
    Model_XF3.0                 -20.1215      0.318    -63.354      0.000     -20.744     -19.499
    Model_XFAero                -19.5800      0.339    -57.700      0.000     -20.245     -18.915
    Model_XFDiesel              -20.0804      0.322    -62.355      0.000     -20.712     -19.449
    Model_XJ2.0L                -19.2596      0.348    -55.405      0.000     -19.941     -18.578
    Model_XJ3.0L                -19.4028      0.326    -59.570      0.000     -20.041     -18.764
    Model_XJ5.0                 -19.5345      0.344    -56.745      0.000     -20.210     -18.860
    Model_XUV300W8               -3.9942      0.171    -23.300      0.000      -4.330      -3.658
    Model_XUV500AT               -3.9524      0.093    -42.556      0.000      -4.134      -3.770
    Model_XUV500W10              -3.8637      0.087    -44.164      0.000      -4.035      -3.692
    Model_XUV500W4               -4.1634      0.116    -36.038      0.000      -4.390      -3.937
    Model_XUV500W6               -4.0692      0.090    -45.260      0.000      -4.245      -3.893
    Model_XUV500W7               -4.1544      0.171    -24.323      0.000      -4.489      -3.820
    Model_XUV500W8               -4.0323      0.077    -52.450      0.000      -4.183      -3.882
    Model_XUV500W9                -4e-13   3.43e-13     -1.167      0.243   -1.07e-12    2.72e-13
    Model_Xcent1.1               -3.6344      0.075    -48.770      0.000      -3.781      -3.488
    Model_Xcent1.2               -3.6668      0.068    -53.637      0.000      -3.801      -3.533
    Model_XenonXT                -4.5936      0.122    -37.769      0.000      -4.832      -4.355
    Model_XyloD2                 -5.0120      0.134    -37.493      0.000      -5.274      -4.750
    Model_XyloD4                 -4.6961      0.104    -45.327      0.000      -4.899      -4.493
    Model_XyloE2                 -4.7978      0.169    -28.452      0.000      -5.128      -4.467
    Model_XyloE4                 -4.5368      0.131    -34.708      0.000      -4.793      -4.281
    Model_XyloE8                 -4.5919      0.118    -39.042      0.000      -4.823      -4.361
    Model_XyloH4                 -4.3561      0.172    -25.343      0.000      -4.693      -4.019
    Model_YetiAmbition           -5.5909      0.130    -43.096      0.000      -5.845      -5.337
    Model_YetiElegance           -5.4991      0.130    -42.151      0.000      -5.755      -5.243
    Model_Z42009-2013            -9.5870      0.194    -49.443      0.000      -9.967      -9.207
    Model_ZenEstilo              -0.3639      0.166     -2.190      0.029      -0.690      -0.038
    Model_ZenLX                  -0.3016      0.192     -1.574      0.116      -0.677       0.074
    Model_ZenLXI               2.961e-17   1.73e-17      1.714      0.087   -4.26e-18    6.35e-17
    Model_ZenLXi                 -0.2961      0.181     -1.639      0.101      -0.650       0.058
    Model_ZenVX                   0.0019      0.221      0.009      0.993      -0.431       0.435
    Model_ZenVXI                 -0.2790      0.181     -1.541      0.123      -0.634       0.076
    Model_ZenVXi                 -0.1740      0.221     -0.788      0.431      -0.607       0.259
    Model_ZestQuadrajet          -4.6607      0.113    -41.395      0.000      -4.881      -4.440
    Model_ZestRevotron           -4.5122      0.099    -45.371      0.000      -4.707      -4.317
    Model_i10Asta                -3.5636      0.090    -39.566      0.000      -3.740      -3.387
    Model_i10Era                 -3.7607      0.065    -57.629      0.000      -3.889      -3.633
    Model_i10Magna               -3.6895      0.061    -60.302      0.000      -3.810      -3.570
    Model_i10Magna(O)            -3.6090      0.162    -22.231      0.000      -3.927      -3.291
    Model_i10Sportz              -3.7076      0.061    -60.603      0.000      -3.828      -3.588
    Model_i201.2                 -3.5059      0.064    -55.086      0.000      -3.631      -3.381
    Model_i201.4                 -3.5404      0.067    -52.771      0.000      -3.672      -3.409
    Model_i202015-2017           -3.5293      0.080    -44.362      0.000      -3.685      -3.373
    Model_i20Active              -3.4474      0.078    -44.056      0.000      -3.601      -3.294
    Model_i20Asta                -3.3911      0.065    -51.836      0.000      -3.519      -3.263
    Model_i20Diesel              -3.4774      0.166    -20.945      0.000      -3.803      -3.152
    Model_i20Era                 -3.6378      0.163    -22.263      0.000      -3.958      -3.317
    Model_i20Magna               -3.5601      0.066    -53.850      0.000      -3.690      -3.430
    Model_i20Sportz              -3.4857      0.064    -54.491      0.000      -3.611      -3.360
    Model_redi-GOS              -28.4620      0.466    -61.044      0.000     -29.376     -27.548
    Model_redi-GOT              -28.4673      0.450    -63.209      0.000     -29.350     -27.584
    ==============================================================================
    Omnibus:                      729.914   Durbin-Watson:                   1.976
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11194.465
    Skew:                          -0.350   Prob(JB):                         0.00
    Kurtosis:                      10.957   Cond. No.                     1.68e+21
    ==============================================================================
    
    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    [2] The smallest eigenvalue is 1.02e-32. This might indicate that there are
    strong multicollinearity problems or that the design matrix is singular.
    
    In [35]:
    get_model_score(olsmodel1)
    
    R-square on training set :  0.9632326167142701
    R-square on test set :  -1.7677825405890375e+83
    RMSE on training set :  2.1857914088403443
    RMSE on test set :  4.4582786612795244e+42
    
    Out[35]:
    [0.9632326167142701,
     -1.7677825405890375e+83,
     2.1857914088403443,
     4.4582786612795244e+42]
    In [36]:
    # Retrive Coeff values, p-values and store them in the dataframe
    olsmod = pd.DataFrame(olsmodel1.params, columns = ['coef'])
    olsmod['pval'] = olsmodel1.pvalues
    
    In [91]:
    # We are looking for overall significant variables
    
    pval_filter = olsmod['pval']<= 0.05
    imp_vars = olsmod[pval_filter].index.tolist()
    
    # We are going to get overall varaibles (un-one-hot encoded varables) from categorical varaibles
    sig_var = []
    for col in imp_vars:
        if '' in col:
            first_part = col.split('_')[0]
            for c in cars_data.columns:
                if first_part in c and c not in sig_var :
                    sig_var.append(c)
    
                    
    start = '\033[1m'
    end = '\033[95m'
    print(start+ 'Most overall significant categorical variables of LINEAR REGRESSION  are ' +end,':\n', sig_var)
    
    Most overall significant categorical variables of LINEAR REGRESSION  are  :
     ['Year', 'Mileage', 'Power', 'kilometers_driven_log', 'Location', 'Fuel_Type', 'Transmission', 'Owner_Type', 'Brand', 'Model']
    

    LR - Statsmodel 2 - Adjusted R2¶

    In [39]:
    #import statsmodels.api as sm - done before
    # Statsmodel api does not add a constant by default. We need to add it explicitly
    x_train = sm.add_constant(X_train)
    
    # Add constant to test data
    x_test = sm.add_constant(X_test)
    
    def build_ols_model(train):
        
        # Create the model
        olsmodel = sm.OLS(y_train["price_log"], train)
        
        return olsmodel.fit()
    
    # Fit linear model on new dataset
    olsmodel2 = build_ols_model(X_train)
    
    print(olsmodel2.summary())
    
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:              price_log   R-squared:                       0.973
    Model:                            OLS   Adj. R-squared:                  0.969
    Method:                 Least Squares   F-statistic:                     204.2
    Date:                Thu, 02 Feb 2023   Prob (F-statistic):               0.00
    Time:                        12:22:34   Log-Likelihood:                 2207.0
    No. Observations:                4211   AIC:                            -3134.
    Df Residuals:                    3571   BIC:                             927.1
    Df Model:                         639                                         
    Covariance Type:            nonrobust                                         
    =============================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
    ---------------------------------------------------------------------------------------------
    Year                          0.0997      0.002     65.021      0.000       0.097       0.103
    Mileage                      -0.0039      0.002     -2.493      0.013      -0.007      -0.001
    Engine                     9.227e-06    2.9e-05      0.318      0.750   -4.76e-05     6.6e-05
    Power                         0.0014      0.000      4.087      0.000       0.001       0.002
    Seats                         0.0118      0.019      0.608      0.543      -0.026       0.050
    kilometers_driven_log        -0.0760      0.005    -14.577      0.000      -0.086      -0.066
    Location_Bangalore            0.1733      0.017     10.163      0.000       0.140       0.207
    Location_Chennai              0.0485      0.016      2.987      0.003       0.017       0.080
    Location_Coimbatore           0.1419      0.015      9.177      0.000       0.112       0.172
    Location_Delhi               -0.0932      0.016     -5.950      0.000      -0.124      -0.062
    Location_Hyderabad            0.1470      0.015      9.765      0.000       0.117       0.177
    Location_Jaipur              -0.0289      0.017     -1.745      0.081      -0.061       0.004
    Location_Kochi               -0.0098      0.015     -0.630      0.529      -0.040       0.021
    Location_Kolkata             -0.2266      0.016    -14.190      0.000      -0.258      -0.195
    Location_Mumbai              -0.0768      0.015     -5.106      0.000      -0.106      -0.047
    Location_Pune                -0.0337      0.016     -2.151      0.032      -0.064      -0.003
    Fuel_Type_Diesel              0.0166      0.031      0.530      0.596      -0.045       0.078
    Fuel_Type_LPG                -0.0655      0.076     -0.859      0.390      -0.215       0.084
    Fuel_Type_Petrol             -0.0912      0.032     -2.852      0.004      -0.154      -0.028
    Transmission_Manual          -0.0960      0.010     -9.375      0.000      -0.116      -0.076
    Owner_Type_Fourth & Above    -0.0864      0.074     -1.172      0.241      -0.231       0.058
    Owner_Type_Second            -0.0514      0.008     -6.637      0.000      -0.067      -0.036
    Owner_Type_Third             -0.1223      0.021     -5.920      0.000      -0.163      -0.082
    Brand_Audi                 -190.4469      3.009    -63.282      0.000    -196.347    -184.546
    Brand_BMW                  -187.0584      2.958    -63.244      0.000    -192.857    -181.259
    Brand_Bentley               -97.9801      1.556    -62.987      0.000    -101.030     -94.930
    Brand_Chevrolet            -189.1596      2.963    -63.834      0.000    -194.969    -183.350
    Brand_Datsun               -170.6479      2.668    -63.953      0.000    -175.879    -165.416
    Brand_Fiat                 -183.4456      2.871    -63.896      0.000    -189.075    -177.817
    Brand_Force                 -98.8717      1.560    -63.395      0.000    -101.930     -95.814
    Brand_Ford                 -192.2491      3.017    -63.715      0.000    -198.165    -186.333
    Brand_Honda                -194.2222      3.048    -63.726      0.000    -200.198    -188.247
    Brand_Hyundai              -194.8630      3.054    -63.807      0.000    -200.851    -188.875
    Brand_Isuzu                 -99.0671      1.560    -63.498      0.000    -102.126     -96.008
    Brand_Jaguar               -176.9958      2.802    -63.174      0.000    -182.489    -171.503
    Brand_Jeep                  -98.8187      1.559    -63.395      0.000    -101.875     -95.762
    Brand_Lamborghini           -97.7633      1.557    -62.777      0.000    -100.817     -94.710
    Brand_Land Rover           -147.6478      2.338    -63.164      0.000    -152.231    -143.065
    Brand_Mahindra             -193.8566      3.047    -63.621      0.000    -199.831    -187.882
    Brand_Maruti               -198.5188      3.096    -64.122      0.000    -204.589    -192.449
    Brand_Mercedes-Benz        -191.4421      3.026    -63.258      0.000    -197.376    -185.509
    Brand_Mini Cooper          -172.4000      2.723    -63.312      0.000    -177.739    -167.061
    Brand_Mitsubishi           -173.0238      2.722    -63.565      0.000    -178.361    -167.687
    Brand_Nissan               -185.1204      2.905    -63.729      0.000    -190.816    -179.425
    Brand_Porsche              -175.0380      2.768    -63.240      0.000    -180.465    -169.611
    Brand_Renault              -188.5606      2.957    -63.770      0.000    -194.358    -182.763
    Brand_Skoda                -192.2906      3.018    -63.721      0.000    -198.207    -186.374
    Brand_Smart                 -99.1499      1.556    -63.725      0.000    -102.200     -96.099
    Brand_Tata                 -194.0216      3.035    -63.925      0.000    -199.972    -188.071
    Brand_Toyota               -191.5153      3.012    -63.582      0.000    -197.421    -185.610
    Brand_Volkswagen           -190.8784      2.997    -63.682      0.000    -196.755    -185.002
    Brand_Volvo                -175.3870      2.768    -63.362      0.000    -180.814    -169.960
    Model_1Series               -10.4063      0.188    -55.323      0.000     -10.775     -10.037
    Model_3Series               -10.2546      0.155    -66.004      0.000     -10.559      -9.950
    Model_5Series                -9.9958      0.157    -63.744      0.000     -10.303      -9.688
    Model_6Series                -9.3836      0.174    -53.847      0.000      -9.725      -9.042
    Model_7Series                -9.5631      0.166    -57.505      0.000      -9.889      -9.237
    Model_800AC                  -0.7719      0.177     -4.371      0.000      -1.118      -0.426
    Model_800DX                1.018e-11    8.9e-13     11.441      0.000    8.44e-12    1.19e-11
    Model_800Std                 -0.5967      0.193     -3.097      0.002      -0.974      -0.219
    Model_A-StarAT               -0.2263      0.192     -1.178      0.239      -0.603       0.150
    Model_A-StarLxi              -0.1284      0.181     -0.710      0.478      -0.483       0.226
    Model_A-StarVxi              -0.0980      0.172     -0.571      0.568      -0.435       0.239
    Model_A335                   -6.9104      0.132    -52.516      0.000      -7.168      -6.652
    Model_A41.8                  -6.8775      0.149    -46.248      0.000      -7.169      -6.586
    Model_A42.0                  -6.8241      0.107    -63.648      0.000      -7.034      -6.614
    Model_A43.0                  -7.0016      0.135    -51.879      0.000      -7.266      -6.737
    Model_A43.2                8.799e-12   1.29e-12      6.820      0.000    6.27e-12    1.13e-11
    Model_A430                   -6.8209      0.186    -36.671      0.000      -7.186      -6.456
    Model_A435                   -6.8321      0.123    -55.571      0.000      -7.073      -6.591
    Model_A4New                  -6.9809      0.136    -51.180      0.000      -7.248      -6.713
    Model_A62.0                  -6.2528      0.185    -33.880      0.000      -6.615      -5.891
    Model_A62.7                  -6.8744      0.129    -53.346      0.000      -7.127      -6.622
    Model_A62.8                  -6.7536      0.182    -37.179      0.000      -7.110      -6.397
    Model_A62011-2015            -6.6467      0.111    -59.636      0.000      -6.865      -6.428
    Model_A63.0                  -6.9279      0.134    -51.887      0.000      -7.190      -6.666
    Model_A635                   -6.4954      0.127    -51.293      0.000      -6.744      -6.247
    Model_A72011-2015            -6.1878      0.188    -32.959      0.000      -6.556      -5.820
    Model_A8L                    -5.7975      0.184    -31.523      0.000      -6.158      -5.437
    Model_AClass                 -5.9945      0.114    -52.575      0.000      -6.218      -5.771
    Model_AccentCRDi             -3.9628      0.120    -32.909      0.000      -4.199      -3.727
    Model_AccentExecutive      3.258e-11   3.66e-12      8.890      0.000    2.54e-11    3.98e-11
    Model_AccentGLE              -3.9540      0.071    -55.455      0.000      -4.094      -3.814
    Model_AccentGLS              -3.8569      0.103    -37.375      0.000      -4.059      -3.655
    Model_Accord2.4              -3.7927      0.078    -48.905      0.000      -3.945      -3.641
    Model_Accord2001-2003        -3.6696      0.127    -28.955      0.000      -3.918      -3.421
    Model_AccordV6               -4.1531      0.168    -24.789      0.000      -4.482      -3.825
    Model_AccordVTi-L            -4.0594      0.128    -31.711      0.000      -4.310      -3.808
    Model_Alto800                -0.4585      0.160     -2.862      0.004      -0.773      -0.144
    Model_AltoGreen              -0.4692      0.223     -2.102      0.036      -0.907      -0.031
    Model_AltoK10                -0.3591      0.160     -2.247      0.025      -0.673      -0.046
    Model_AltoLX                 -0.6173      0.192     -3.215      0.001      -0.994      -0.241
    Model_AltoLXI              -3.96e-12   8.81e-13     -4.493      0.000   -5.69e-12   -2.23e-12
    Model_AltoLXi                -0.2642      0.160     -1.648      0.099      -0.579       0.050
    Model_AltoStd                -0.2983      0.191     -1.558      0.119      -0.674       0.077
    Model_AltoVXi             -5.404e-12   6.62e-13     -8.165      0.000    -6.7e-12   -4.11e-12
    Model_AltoVxi                 0.1386      0.221      0.626      0.531      -0.295       0.572
    Model_AmazeE                 -4.2098      0.085    -49.309      0.000      -4.377      -4.042
    Model_AmazeEX                -4.3100      0.110    -39.351      0.000      -4.525      -4.095
    Model_AmazeS                 -4.2362      0.071    -59.441      0.000      -4.376      -4.096
    Model_AmazeSX                -4.2757      0.101    -42.256      0.000      -4.474      -4.077
    Model_AmazeV                 -4.2274      0.127    -33.191      0.000      -4.477      -3.978
    Model_AmazeVX                -4.1739      0.077    -54.159      0.000      -4.325      -4.023
    Model_Ameo1.2                -7.6078      0.128    -59.497      0.000      -7.859      -7.357
    Model_Ameo1.5                -7.5577      0.147    -51.337      0.000      -7.846      -7.269
    Model_AspireAmbiente         -6.2611      0.181    -34.533      0.000      -6.617      -5.906
    Model_AspireTitanium         -5.9793      0.130    -45.825      0.000      -6.235      -5.723
    Model_Aveo1.4                -9.5678      0.170    -56.322      0.000      -9.901      -9.235
    Model_Aveo1.6                -9.7178      0.209    -46.450      0.000     -10.128      -9.308
    Model_AveoU-VA               -9.7213      0.160    -60.940      0.000     -10.034      -9.408
    Model_AvventuraMULTIJET     -14.9094      0.259    -57.580      0.000     -15.417     -14.402
    Model_BClass                 -6.0582      0.105    -57.530      0.000      -6.265      -5.852
    Model_BR-Vi-DTEC           6.159e-13   8.54e-13      0.721      0.471   -1.06e-12    2.29e-12
    Model_BR-Vi-VTEC             -4.0353      0.134    -30.145      0.000      -4.298      -3.773
    Model_BRVi-VTEC              -3.9152      0.104    -37.533      0.000      -4.120      -3.711
    Model_BalenoAlpha             0.2921      0.163      1.793      0.073      -0.027       0.612
    Model_BalenoDelta             0.1067      0.166      0.644      0.520      -0.218       0.431
    Model_BalenoLXI              -0.4686      0.192     -2.438      0.015      -0.845      -0.092
    Model_BalenoRS                0.2623      0.176      1.490      0.136      -0.083       0.607
    Model_BalenoSigma             0.0793      0.182      0.436      0.663      -0.277       0.436
    Model_BalenoVxi              -0.3108      0.192     -1.616      0.106      -0.688       0.066
    Model_BalenoZeta              0.1698      0.164      1.035      0.301      -0.152       0.492
    Model_BeatDiesel             -9.7274      0.154    -62.962      0.000     -10.030      -9.425
    Model_BeatLS                 -9.6412      0.159    -60.686      0.000      -9.953      -9.330
    Model_BeatLT                 -9.6850      0.155    -62.370      0.000      -9.989      -9.381
    Model_BeatOption           2.326e-12   7.66e-13      3.038      0.002    8.25e-13    3.83e-12
    Model_Beetle2.0              1.4e-12      8e-13      1.749      0.080   -1.69e-13    2.97e-12
    Model_BoleroDI               -4.3241      0.168    -25.667      0.000      -4.654      -3.994
    Model_BoleroSLE              -4.6119      0.170    -27.160      0.000      -4.945      -4.279
    Model_BoleroSLX              -4.3986      0.169    -25.971      0.000      -4.731      -4.067
    Model_BoleroVLX              -4.3062      0.170    -25.279      0.000      -4.640      -3.972
    Model_BoleroZLX              -4.3942      0.098    -44.822      0.000      -4.586      -4.202
    Model_BoleromHAWK          3.808e-12   7.56e-13      5.040      0.000    2.33e-12    5.29e-12
    Model_BoltQuadrajet          -4.6461      0.136    -34.071      0.000      -4.913      -4.379
    Model_BoltRevotron           -4.8586      0.172    -28.265      0.000      -5.196      -4.522
    Model_BoxsterS            -1.907e-12   7.73e-13     -2.468      0.014   -3.42e-12   -3.92e-13
    Model_Brio1.2                -4.3234      0.101    -42.859      0.000      -4.521      -4.126
    Model_BrioE                  -4.3443      0.166    -26.130      0.000      -4.670      -4.018
    Model_BrioEX                 -4.4702      0.167    -26.704      0.000      -4.798      -4.142
    Model_BrioS                  -4.3564      0.071    -60.978      0.000      -4.496      -4.216
    Model_BrioV                  -4.3053      0.090    -48.050      0.000      -4.481      -4.130
    Model_BrioVX                 -4.3074      0.088    -48.723      0.000      -4.481      -4.134
    Model_C-ClassProgressive     -5.8321      0.148    -39.441      0.000      -6.122      -5.542
    Model_CLA200                 -5.8348      0.104    -55.887      0.000      -6.039      -5.630
    Model_CLS-Class2006-2010     -5.2682      0.175    -30.183      0.000      -5.610      -4.926
    Model_CR-V2.0              2.984e-12   8.38e-13      3.559      0.000    1.34e-12    4.63e-12
    Model_CR-V2.0L               -3.4664      0.095    -36.526      0.000      -3.652      -3.280
    Model_CR-V2.4                -3.6665      0.090    -40.900      0.000      -3.842      -3.491
    Model_CR-V2.4L               -3.3975      0.101    -33.706      0.000      -3.595      -3.200
    Model_CR-VAT              -5.736e-13   8.67e-13     -0.662      0.508   -2.27e-12    1.13e-12
    Model_CR-VPetrol          -2.473e-12   6.82e-13     -3.626      0.000   -3.81e-12   -1.14e-12
    Model_CR-VRVi                -3.3289      0.126    -26.476      0.000      -3.575      -3.082
    Model_CR-VSport              -3.1921      0.168    -19.021      0.000      -3.521      -2.863
    Model_Camry2.5               -5.5548      0.186    -29.843      0.000      -5.920      -5.190
    Model_CamryA/T            -8.829e-13   5.96e-13     -1.482      0.138   -2.05e-12    2.85e-13
    Model_CamryHybrid            -5.5055      0.138    -40.038      0.000      -5.775      -5.236
    Model_CamryW2                -6.5387      0.182    -35.919      0.000      -6.896      -6.182
    Model_CamryW4                -6.6148      0.182    -36.353      0.000      -6.972      -6.258
    Model_CaptivaLT           -2.426e-13    8.2e-13     -0.296      0.767   -1.85e-12    1.37e-12
    Model_CaptivaLTZ             -9.1088      0.212    -42.900      0.000      -9.525      -8.693
    Model_Captur1.5              -9.4996      0.189    -50.314      0.000      -9.870      -9.129
    Model_Cayenne2009-2014      -21.8549      0.362    -60.438      0.000     -22.564     -21.146
    Model_CayenneBase           -25.2074      0.380    -66.343      0.000     -25.952     -24.462
    Model_CayenneDiesel         -21.4012      0.377    -56.837      0.000     -22.139     -20.663
    Model_CayenneS              -21.6049      0.380    -56.876      0.000     -22.350     -20.860
    Model_CayenneTurbo          -21.6669      0.378    -57.262      0.000     -22.409     -20.925
    Model_Cayman2009-2012       -21.0656      0.376    -56.016      0.000     -21.803     -20.328
    Model_CediaSports          1.898e-12   8.27e-13      2.296      0.022    2.77e-13    3.52e-12
    Model_CelerioCNG             -0.0125      0.224     -0.056      0.955      -0.451       0.426
    Model_CelerioLDi             -0.4440      0.222     -2.001      0.046      -0.879      -0.009
    Model_CelerioLXI             -0.1513      0.182     -0.831      0.406      -0.508       0.205
    Model_CelerioVXI             -0.1642      0.161     -1.020      0.308      -0.480       0.151
    Model_CelerioZDi             -0.1584      0.222     -0.713      0.476      -0.594       0.277
    Model_CelerioZXI             -0.0793      0.164     -0.484      0.629      -0.401       0.242
    Model_Ciaz1.3                 0.3287      0.173      1.901      0.057      -0.010       0.668
    Model_Ciaz1.4                 0.4380      0.177      2.481      0.013       0.092       0.784
    Model_CiazAT                  0.3220      0.193      1.670      0.095      -0.056       0.700
    Model_CiazAlpha            3.153e-12   9.59e-13      3.287      0.001    1.27e-12    5.03e-12
    Model_CiazRS                  0.5411      0.222      2.436      0.015       0.106       0.977
    Model_CiazVDI                 0.2534      0.182      1.393      0.164      -0.103       0.610
    Model_CiazVDi                 0.3388      0.169      2.007      0.045       0.008       0.670
    Model_CiazVXi                 0.3646      0.176      2.066      0.039       0.019       0.711
    Model_CiazZDi                 0.4153      0.164      2.534      0.011       0.094       0.737
    Model_CiazZXi                 0.4435      0.170      2.602      0.009       0.109       0.778
    Model_CiazZeta                0.4020      0.193      2.083      0.037       0.024       0.780
    Model_City1.3                -4.1017      0.095    -43.123      0.000      -4.288      -3.915
    Model_City1.5                -4.0157      0.065    -61.754      0.000      -4.143      -3.888
    Model_CityCorporate          -3.9912      0.166    -24.090      0.000      -4.316      -3.666
    Model_CityV                  -3.9466      0.076    -52.148      0.000      -4.095      -3.798
    Model_CityZX                 -4.1572      0.073    -56.656      0.000      -4.301      -4.013
    Model_Cityi                  -3.8646      0.069    -55.892      0.000      -4.000      -3.729
    Model_Cityi-DTEC             -3.6114      0.128    -28.140      0.000      -3.863      -3.360
    Model_Cityi-VTEC             -3.8434      0.078    -49.402      0.000      -3.996      -3.691
    Model_Civic2006-2010         -4.0615      0.073    -55.839      0.000      -4.204      -3.919
    Model_Civic2010-2013         -4.1479      0.086    -47.979      0.000      -4.317      -3.978
    Model_Classic1.4           2.994e-12   6.37e-13      4.702      0.000    1.75e-12    4.24e-12
    Model_ClassicNova          -198.4473      3.106    -63.893      0.000    -204.537    -192.358
    Model_ClubmanCooper         -24.5038      0.417    -58.772      0.000     -25.321     -23.686
    Model_Compass1.4          -1.347e-12   7.93e-13     -1.698      0.090    -2.9e-12    2.09e-13
    Model_Compass2.0            -98.8187      1.559    -63.395      0.000    -101.875     -95.762
    Model_ContinentalFlying     -97.9801      1.556    -62.987      0.000    -101.030     -94.930
    Model_Cooper3               -24.6235      0.395    -62.333      0.000     -25.398     -23.849
    Model_Cooper5               -24.7427      0.397    -62.305      0.000     -25.521     -23.964
    Model_CooperConvertible     -24.4751      0.396    -61.837      0.000     -25.251     -23.699
    Model_CooperCountryman      -24.7224      0.401    -61.583      0.000     -25.509     -23.935
    Model_CooperS               -24.3721      0.400    -60.933      0.000     -25.156     -23.588
    Model_Corolla1.8             -6.1820      0.184    -33.655      0.000      -6.542      -5.822
    Model_CorollaAltis           -6.3750      0.106    -59.873      0.000      -6.584      -6.166
    Model_CorollaDX              -6.7961      0.177    -38.370      0.000      -7.143      -6.449
    Model_CorollaExecutive       -6.6601      0.180    -37.061      0.000      -7.012      -6.308
    Model_CorollaH2              -7.0250      0.180    -39.073      0.000      -7.378      -6.673
    Model_CorollaH4              -6.7367      0.122    -55.083      0.000      -6.976      -6.497
    Model_CorollaH5              -6.8531      0.145    -47.104      0.000      -7.138      -6.568
    Model_CountrymanCooper      -24.9604      0.417    -59.853      0.000     -25.778     -24.143
    Model_Creta1.4               -3.0382      0.088    -34.400      0.000      -3.211      -2.865
    Model_Creta1.6               -3.0055      0.066    -45.418      0.000      -3.135      -2.876
    Model_CrossPolo1.5           -7.5774      0.158    -47.815      0.000      -7.888      -7.267
    Model_CruzeLTZ               -9.0739      0.161    -56.266      0.000      -9.390      -8.758
    Model_D-MAXV-Cross          -99.0671      1.560    -63.498      0.000    -102.126     -96.008
    Model_Duster110PS            -9.5441      0.161    -59.357      0.000      -9.859      -9.229
    Model_Duster85PS             -9.6185      0.161    -59.782      0.000      -9.934      -9.303
    Model_DusterAdventure        -9.6232      0.217    -44.358      0.000     -10.049      -9.198
    Model_DusterPetrol           -9.8249      0.219    -44.916      0.000     -10.254      -9.396
    Model_DusterRXZ            7.797e-13   7.49e-13      1.041      0.298   -6.89e-13    2.25e-12
    Model_DzireAMT                0.1008      0.177      0.570      0.569      -0.246       0.448
    Model_DzireLDI                0.0997      0.223      0.448      0.654      -0.337       0.536
    Model_DzireNew                0.3045      0.222      1.369      0.171      -0.132       0.741
    Model_DzireVDI                0.2813      0.176      1.595      0.111      -0.065       0.627
    Model_DzireVXI                0.2521      0.181      1.390      0.165      -0.104       0.608
    Model_DzireZDI                0.2884      0.193      1.494      0.135      -0.090       0.667
    Model_E-Class200           3.363e-12   1.14e-12      2.940      0.003    1.12e-12    5.61e-12
    Model_E-Class2009-2013       -5.6691      0.094    -60.194      0.000      -5.854      -5.484
    Model_E-Class2015-2017       -5.5356      0.103    -53.498      0.000      -5.738      -5.333
    Model_E-Class220           9.172e-13   8.03e-13      1.143      0.253   -6.57e-13    2.49e-12
    Model_E-Class230             -5.9963      0.124    -48.173      0.000      -6.240      -5.752
    Model_E-Class250             -5.7455      0.167    -34.356      0.000      -6.073      -5.418
    Model_E-Class280             -5.9938      0.109    -55.188      0.000      -6.207      -5.781
    Model_E-ClassE               -5.2590      0.141    -37.223      0.000      -5.536      -4.982
    Model_E-ClassE250            -5.6625      0.111    -51.194      0.000      -5.879      -5.446
    Model_E-ClassE270            -5.8159      0.171    -33.989      0.000      -6.151      -5.480
    Model_E-ClassE350            -5.4948      0.177    -31.026      0.000      -5.842      -5.148
    Model_E-ClassE400            -5.0602      0.178    -28.381      0.000      -5.410      -4.711
    Model_E-ClassFacelift        -5.5342      0.179    -30.920      0.000      -5.885      -5.183
    Model_EON1.0              -1.499e-12   8.93e-13     -1.680      0.093   -3.25e-12    2.51e-13
    Model_EOND                   -3.9973      0.075    -53.633      0.000      -4.143      -3.851
    Model_EONEra                 -4.0156      0.073    -55.176      0.000      -4.158      -3.873
    Model_EONLPG              -4.633e-13   7.84e-13     -0.591      0.555      -2e-12    1.07e-12
    Model_EONMagna               -4.0661      0.079    -51.394      0.000      -4.221      -3.911
    Model_EONSportz              -4.0561      0.108    -37.581      0.000      -4.268      -3.844
    Model_EcoSport1.0            -5.8898      0.133    -44.183      0.000      -6.151      -5.628
    Model_EcoSport1.5            -5.9411      0.103    -57.553      0.000      -6.143      -5.739
    Model_Ecosport1.0            -5.8605      0.180    -32.604      0.000      -6.213      -5.508
    Model_Ecosport1.5            -5.8902      0.101    -58.596      0.000      -6.087      -5.693
    Model_EcosportSignature      -5.9285      0.147    -40.462      0.000      -6.216      -5.641
    Model_Eeco5                  -0.3819      0.178     -2.144      0.032      -0.731      -0.033
    Model_Eeco7                  -0.3282      0.172     -1.909      0.056      -0.665       0.009
    Model_EecoCNG              1.696e-12   8.65e-13      1.960      0.050   -1.23e-16    3.39e-12
    Model_EecoSmiles          -7.436e-14   6.05e-13     -0.123      0.902   -1.26e-12    1.11e-12
    Model_Elantra1.6             -3.0195      0.124    -24.297      0.000      -3.263      -2.776
    Model_Elantra2.0             -2.7589      0.126    -21.923      0.000      -3.006      -2.512
    Model_ElantraCRDi            -3.0443      0.076    -40.255      0.000      -3.193      -2.896
    Model_ElantraSX              -2.9220      0.165    -17.670      0.000      -3.246      -2.598
    Model_Elitei20               -3.4032      0.078    -43.355      0.000      -3.557      -3.249
    Model_Endeavour2.2           -5.0802      0.134    -37.931      0.000      -5.343      -4.818
    Model_Endeavour2.5L          -5.5773      0.137    -40.693      0.000      -5.846      -5.309
    Model_Endeavour3.0L          -5.6725      0.128    -44.488      0.000      -5.922      -5.422
    Model_Endeavour3.2           -5.0355      0.123    -40.974      0.000      -5.276      -4.794
    Model_Endeavour4x2           -5.5870      0.154    -36.214      0.000      -5.890      -5.285
    Model_EndeavourHurricane     -5.7893      0.153    -37.919      0.000      -6.089      -5.490
    Model_EndeavourTitanium   -7.985e-14   4.78e-13     -0.167      0.867   -1.02e-12    8.58e-13
    Model_EndeavourXLT           -5.4897      0.149    -36.882      0.000      -5.782      -5.198
    Model_Enjoy1.3               -9.3697      0.192    -48.893      0.000      -9.745      -8.994
    Model_Enjoy1.4               -9.3457      0.218    -42.903      0.000      -9.773      -8.919
    Model_EnjoyPetrol            -9.4752      0.214    -44.216      0.000      -9.895      -9.055
    Model_EnjoyTCDi              -9.4791      0.180    -52.708      0.000      -9.832      -9.127
    Model_ErtigaLXI               0.4891      0.225      2.172      0.030       0.048       0.931
    Model_ErtigaPaseo             0.2952      0.226      1.307      0.191      -0.148       0.738
    Model_ErtigaSHVS              0.3766      0.173      2.174      0.030       0.037       0.716
    Model_ErtigaVDI               0.4251      0.167      2.546      0.011       0.098       0.753
    Model_ErtigaVXI               0.4107      0.172      2.390      0.017       0.074       0.747
    Model_ErtigaZDI               0.4537      0.167      2.715      0.007       0.126       0.781
    Model_ErtigaZXI               0.4604      0.186      2.479      0.013       0.096       0.825
    Model_EsteemLX               -0.4936      0.222     -2.227      0.026      -0.928      -0.059
    Model_EsteemVxi              -0.4946      0.175     -2.828      0.005      -0.838      -0.152
    Model_EstiloLXI              -0.1249      0.181     -0.691      0.489      -0.479       0.229
    Model_Etios1.4            -1.591e-12   7.08e-13     -2.246      0.025   -2.98e-12   -2.02e-13
    Model_EtiosCross             -6.8298      0.130    -52.610      0.000      -7.084      -6.575
    Model_EtiosG                 -6.9041      0.124    -55.793      0.000      -7.147      -6.662
    Model_EtiosGD                -6.8827      0.124    -55.525      0.000      -7.126      -6.640
    Model_EtiosLiva              -7.0123      0.111    -63.387      0.000      -7.229      -6.795
    Model_EtiosPetrol            -6.8233      0.185    -36.954      0.000      -7.185      -6.461
    Model_EtiosV                 -6.8643      0.184    -37.367      0.000      -7.225      -6.504
    Model_EtiosVD                -6.5880      0.135    -48.863      0.000      -6.852      -6.324
    Model_EtiosVX                -6.8650      0.149    -45.942      0.000      -7.158      -6.572
    Model_EtiosVXD            -2.356e-14   9.31e-13     -0.025      0.980   -1.85e-12     1.8e-12
    Model_Evalia2013            -13.4928      0.261    -51.715      0.000     -14.004     -12.981
    Model_FType                 -19.3359      0.349    -55.456      0.000     -20.020     -18.652
    Model_Fabia1.2               -6.4629      0.115    -56.273      0.000      -6.688      -6.238
    Model_Fabia1.2L              -6.3157      0.177    -35.664      0.000      -6.663      -5.969
    Model_Fabia1.4               -6.0552      0.177    -34.241      0.000      -6.402      -5.708
    Model_Fabia1.6               -6.5606      0.179    -36.584      0.000      -6.912      -6.209
    Model_Fiesta1.4              -6.3168      0.101    -62.706      0.000      -6.514      -6.119
    Model_Fiesta1.5              -6.0775      0.180    -33.728      0.000      -6.431      -5.724
    Model_Fiesta1.6              -6.4219      0.119    -53.896      0.000      -6.655      -6.188
    Model_FiestaClassic          -6.4737      0.118    -54.844      0.000      -6.705      -6.242
    Model_FiestaDiesel           -5.7724      0.178    -32.492      0.000      -6.121      -5.424
    Model_FiestaEXi              -6.3823      0.143    -44.766      0.000      -6.662      -6.103
    Model_FiestaTitanium       3.366e-13   7.26e-13      0.464      0.643   -1.09e-12    1.76e-12
    Model_Figo1.2P             3.353e-13      6e-13      0.558      0.577   -8.42e-13    1.51e-12
    Model_Figo1.5D               -6.0593      0.141    -42.902      0.000      -6.336      -5.782
    Model_Figo2015-2019          -6.3651      0.113    -56.397      0.000      -6.586      -6.144
    Model_FigoAspire             -6.2831      0.113    -55.662      0.000      -6.504      -6.062
    Model_FigoDiesel             -6.4529      0.100    -64.730      0.000      -6.648      -6.257
    Model_FigoPetrol             -6.3359      0.105    -60.412      0.000      -6.541      -6.130
    Model_FigoTitanium           -6.6700      0.181    -36.926      0.000      -7.024      -6.316
    Model_Fluence1.5             -9.9159      0.215    -46.065      0.000     -10.338      -9.494
    Model_Fluence2.0             -9.6142      0.215    -44.796      0.000     -10.035      -9.193
    Model_FluenceDiesel          -9.7868      0.187    -52.206      0.000     -10.154      -9.419
    Model_Fortuner2.8            -5.7408      0.124    -46.118      0.000      -5.985      -5.497
    Model_Fortuner3.0            -5.8331      0.116    -50.390      0.000      -6.060      -5.606
    Model_Fortuner4x2            -5.7433      0.116    -49.371      0.000      -5.971      -5.515
    Model_Fortuner4x4            -5.7713      0.127    -45.605      0.000      -6.019      -5.523
    Model_FortunerTRD            -5.8636      0.186    -31.511      0.000      -6.228      -5.499
    Model_FortwoCDI             -99.1499      1.556    -63.725      0.000    -102.200     -96.099
    Model_FreestyleTitanium      -5.6613      0.140    -40.353      0.000      -5.936      -5.386
    Model_FusionPlus             -6.0965      0.177    -34.478      0.000      -6.443      -5.750
    Model_GL-Class2007           -5.1038      0.118    -43.332      0.000      -5.335      -4.873
    Model_GL-Class350            -5.1340      0.149    -34.357      0.000      -5.427      -4.841
    Model_GLAClass               -5.7046      0.104    -54.912      0.000      -5.908      -5.501
    Model_GLC220                 -5.4271      0.141    -38.385      0.000      -5.704      -5.150
    Model_GLC220d                -5.3741      0.141    -38.093      0.000      -5.651      -5.098
    Model_GLC43                  -5.3052      0.182    -29.170      0.000      -5.662      -4.949
    Model_GLE250d                -5.1583      0.119    -43.309      0.000      -5.392      -4.925
    Model_GLE350d                -5.2360      0.113    -46.495      0.000      -5.457      -5.015
    Model_GLS350d                -5.0848      0.151    -33.597      0.000      -5.382      -4.788
    Model_GONXT                 -28.3479      0.464    -61.029      0.000     -29.259     -27.437
    Model_GOPlus                -28.2127      0.461    -61.159      0.000     -29.117     -27.308
    Model_GOT                   -28.5319      0.466    -61.167      0.000     -29.446     -27.617
    Model_GallardoCoupe         -97.7633      1.557    -62.777      0.000    -100.817     -94.710
    Model_Getz1.3              4.582e-13   7.91e-13      0.579      0.562   -1.09e-12    2.01e-12
    Model_Getz1.5                -3.9572      0.163    -24.223      0.000      -4.278      -3.637
    Model_GetzGLE                -3.7616      0.102    -36.991      0.000      -3.961      -3.562
    Model_GetzGLS                -4.2214      0.092    -45.985      0.000      -4.401      -4.041
    Model_GetzGVS                -3.3482      0.162    -20.704      0.000      -3.665      -3.031
    Model_GrandVitara             0.5830      0.197      2.953      0.003       0.196       0.970
    Model_GrandePunto           -15.2656      0.257    -59.464      0.000     -15.769     -14.762
    Model_Grandi10               -3.6844      0.063    -58.926      0.000      -3.807      -3.562
    Model_HexaXT                 -3.7729      0.143    -26.309      0.000      -4.054      -3.492
    Model_HexaXTA                -3.7857      0.178    -21.261      0.000      -4.135      -3.437
    Model_Ignis1.2               -0.2619      0.193     -1.358      0.175      -0.640       0.116
    Model_Ignis1.3                0.0508      0.222      0.228      0.819      -0.385       0.487
    Model_Ikon1.3                -6.4491      0.110    -58.648      0.000      -6.665      -6.234
    Model_Ikon1.4                -7.2269      0.179    -40.318      0.000      -7.578      -6.875
    Model_Ikon1.6                -6.6108      0.173    -38.118      0.000      -6.951      -6.271
    Model_IndicaDLS              -5.3396      0.116    -46.008      0.000      -5.567      -5.112
    Model_IndicaGLS              -5.0438      0.169    -29.839      0.000      -5.375      -4.712
    Model_IndicaLEI              -4.9374      0.168    -29.340      0.000      -5.267      -4.607
    Model_IndicaV2               -5.2222      0.092    -57.000      0.000      -5.402      -5.043
    Model_IndicaVista            -4.9346      0.092    -53.536      0.000      -5.115      -4.754
    Model_IndigoCS               -4.9044      0.097    -50.510      0.000      -5.095      -4.714
    Model_IndigoGLE              -5.1794      0.133    -38.836      0.000      -5.441      -4.918
    Model_IndigoLS               -4.9978      0.106    -47.154      0.000      -5.206      -4.790
    Model_IndigoLX               -5.0369      0.109    -46.385      0.000      -5.250      -4.824
    Model_IndigoXL             -2.02e-12   9.41e-13     -2.147      0.032   -3.86e-12   -1.75e-13
    Model_IndigoeCS              -4.9951      0.105    -47.763      0.000      -5.200      -4.790
    Model_Innova2.0              -6.0566      0.134    -45.055      0.000      -6.320      -5.793
    Model_Innova2.5              -6.1405      0.114    -53.762      0.000      -6.364      -5.917
    Model_InnovaCrysta           -6.1272      0.118    -51.854      0.000      -6.359      -5.896
    Model_Jazz1.2                -4.1526      0.077    -54.278      0.000      -4.303      -4.003
    Model_Jazz1.5                -4.0671      0.085    -48.050      0.000      -4.233      -3.901
    Model_JazzActive             -4.3858      0.165    -26.652      0.000      -4.708      -4.063
    Model_JazzExclusive          -4.1900      0.167    -25.148      0.000      -4.517      -3.863
    Model_JazzMode               -4.3447      0.165    -26.382      0.000      -4.668      -4.022
    Model_JazzS                  -4.2829      0.124    -34.425      0.000      -4.527      -4.039
    Model_JazzSelect             -4.2641      0.125    -34.169      0.000      -4.509      -4.019
    Model_JazzV                  -4.0543      0.096    -42.359      0.000      -4.242      -3.867
    Model_JazzVX                 -4.1028      0.102    -40.267      0.000      -4.303      -3.903
    Model_JeepMM                 -4.2671      0.128    -33.320      0.000      -4.518      -4.016
    Model_Jetta2007-2011         -7.1305      0.128    -55.491      0.000      -7.382      -6.879
    Model_Jetta2012-2014         -6.9819      0.135    -51.810      0.000      -7.246      -6.718
    Model_Jetta2013-2015         -6.9901      0.130    -53.585      0.000      -7.246      -6.734
    Model_KUV100                 -4.8161      0.093    -51.601      0.000      -4.999      -4.633
    Model_KWID1.0               -10.3027      0.172    -59.733      0.000     -10.641      -9.964
    Model_KWIDAMT               -10.5141      0.217    -48.431      0.000     -10.940     -10.088
    Model_KWIDClimber           -10.3249      0.175    -58.880      0.000     -10.669      -9.981
    Model_KWIDRXL               -10.6211      0.188    -56.634      0.000     -10.989     -10.253
    Model_KWIDRXT               -10.3963      0.161    -64.647      0.000     -10.712     -10.081
    Model_Koleos2.0              -9.2702      0.181    -51.192      0.000      -9.625      -8.915
    Model_Lancer1.5             -25.4607      0.404    -63.075      0.000     -26.252     -24.669
    Model_LancerGLXD            -24.6319      0.407    -60.487      0.000     -25.430     -23.833
    Model_Laura1.8               -5.9655      0.129    -46.238      0.000      -6.218      -5.713
    Model_Laura1.9               -5.7571      0.120    -48.087      0.000      -5.992      -5.522
    Model_LauraAmbiente          -5.7958      0.115    -50.222      0.000      -6.022      -5.570
    Model_LauraAmbition          -5.7674      0.128    -44.972      0.000      -6.019      -5.516
    Model_LauraClassic           -6.0986      0.178    -34.311      0.000      -6.447      -5.750
    Model_LauraElegance          -5.8082      0.127    -45.618      0.000      -6.058      -5.559
    Model_LauraL                 -6.0911      0.141    -43.049      0.000      -6.369      -5.814
    Model_LauraRS                -5.6934      0.180    -31.668      0.000      -6.046      -5.341
    Model_Linea1.3              -14.9663      0.284    -52.623      0.000     -15.524     -14.409
    Model_LineaClassic          -15.3956      0.284    -54.134      0.000     -15.953     -14.838
    Model_LineaEmotion          -15.0820      0.257    -58.573      0.000     -15.587     -14.577
    Model_LineaT                -15.4314      0.282    -54.735      0.000     -15.984     -14.879
    Model_LineaT-Jet            -14.9673      0.285    -52.565      0.000     -15.526     -14.409
    Model_Lodgy110PS             -9.6851      0.201    -48.158      0.000     -10.079      -9.291
    Model_LoganDiesel            -4.9742      0.173    -28.832      0.000      -5.312      -4.636
    Model_LoganPetrol            -5.1021      0.170    -29.953      0.000      -5.436      -4.768
    Model_M-ClassML              -5.4614      0.098    -55.736      0.000      -5.654      -5.269
    Model_MUX4WD              -3.281e-12   8.43e-13     -3.889      0.000   -4.93e-12   -1.63e-12
    Model_ManzaAqua              -4.9587      0.116    -42.575      0.000      -5.187      -4.730
    Model_ManzaAura              -4.9780      0.108    -46.159      0.000      -5.189      -4.767
    Model_ManzaClub              -4.8096      0.171    -28.121      0.000      -5.145      -4.474
    Model_ManzaELAN              -4.5108      0.133    -33.832      0.000      -4.772      -4.249
    Model_MicraActive           -13.5738      0.223    -60.756      0.000     -14.012     -13.136
    Model_MicraDiesel           -13.4818      0.212    -63.576      0.000     -13.898     -13.066
    Model_MicraXE               -13.3985      0.252    -53.151      0.000     -13.893     -12.904
    Model_MicraXL               -13.6417      0.223    -61.171      0.000     -14.079     -13.204
    Model_MicraXV               -13.3898      0.216    -62.032      0.000     -13.813     -12.967
    Model_MobilioE               -4.1256      0.171    -24.182      0.000      -4.460      -3.791
    Model_MobilioRS              -4.0275      0.132    -30.420      0.000      -4.287      -3.768
    Model_MobilioS               -4.1556      0.103    -40.275      0.000      -4.358      -3.953
    Model_MobilioV               -3.9936      0.117    -34.095      0.000      -4.223      -3.764
    Model_Montero3.2            -24.4339      0.416    -58.706      0.000     -25.250     -23.618
    Model_MustangV8              -4.6117      0.196    -23.492      0.000      -4.997      -4.227
    Model_NanoCX                 -5.5041      0.172    -32.001      0.000      -5.841      -5.167
    Model_NanoCx                 -5.7012      0.133    -42.862      0.000      -5.962      -5.440
    Model_NanoLX                 -5.5080      0.132    -41.629      0.000      -5.767      -5.249
    Model_NanoLx                 -5.6141      0.117    -48.139      0.000      -5.843      -5.385
    Model_NanoSTD                -6.0777      0.173    -35.202      0.000      -6.416      -5.739
    Model_NanoTwist              -5.4198      0.102    -53.096      0.000      -5.620      -5.220
    Model_NanoXT                 -5.3621      0.134    -39.927      0.000      -5.625      -5.099
    Model_NanoXTA                -5.1968      0.103    -50.364      0.000      -5.399      -4.994
    Model_NewC-Class             -5.7923      0.091    -63.940      0.000      -5.970      -5.615
    Model_NewSafari              -4.4754      0.105    -42.567      0.000      -4.682      -4.269
    Model_Nexon1.2             2.531e-12   9.72e-13      2.603      0.009    6.25e-13    4.44e-12
    Model_Nexon1.5               -4.1240      0.173    -23.884      0.000      -4.463      -3.785
    Model_NuvoSportN6            -4.7146      0.171    -27.545      0.000      -5.050      -4.379
    Model_NuvoSportN8         -2.615e-12   8.19e-13     -3.195      0.001   -4.22e-12   -1.01e-12
    Model_Octavia1.9             -6.2216      0.177    -35.204      0.000      -6.568      -5.875
    Model_Octavia2.0             -5.2702      0.145    -36.472      0.000      -5.554      -4.987
    Model_OctaviaAmbiente        -6.0928      0.125    -48.593      0.000      -6.339      -5.847
    Model_OctaviaAmbition        -5.3890      0.124    -43.577      0.000      -5.631      -5.147
    Model_OctaviaClassic         -6.0666      0.138    -44.022      0.000      -6.337      -5.796
    Model_OctaviaElegance        -5.3664      0.112    -48.092      0.000      -5.585      -5.148
    Model_OctaviaL               -6.0409      0.176    -34.270      0.000      -6.387      -5.695
    Model_OctaviaRS              -6.2112      0.176    -35.205      0.000      -6.557      -5.865
    Model_OctaviaRider           -6.1500      0.140    -43.943      0.000      -6.424      -5.876
    Model_OctaviaStyle        -2.409e-12   5.73e-13     -4.206      0.000   -3.53e-12   -1.29e-12
    Model_Omni5                  -0.4261      0.193     -2.212      0.027      -0.804      -0.048
    Model_Omni8                  -0.5503      0.173     -3.187      0.001      -0.889      -0.212
    Model_OmniE                  -0.5724      0.186     -3.078      0.002      -0.937      -0.208
    Model_OmniMPI                -0.5397      0.182     -2.965      0.003      -0.897      -0.183
    Model_OneLX                 -98.8717      1.560    -63.395      0.000    -101.930     -95.814
    Model_Optra1.6               -9.3441      0.181    -51.752      0.000      -9.698      -8.990
    Model_OptraMagnum            -9.4623      0.158    -59.936      0.000      -9.772      -9.153
    Model_Outlander2.4          -24.7546      0.402    -61.557      0.000     -25.543     -23.966
    Model_Pajero2.8             -24.5173      0.397    -61.800      0.000     -25.295     -23.739
    Model_Pajero4X4             -24.7015      0.422    -58.476      0.000     -25.530     -23.873
    Model_PajeroSport           -24.5237      0.404    -60.764      0.000     -25.315     -23.732
    Model_Panamera2010          -21.0899      0.378    -55.822      0.000     -21.831     -20.349
    Model_PanameraDiesel        -21.1472      0.354    -59.682      0.000     -21.842     -20.453
    Model_Passat1.8              -7.2679      0.191    -38.087      0.000      -7.642      -6.894
    Model_Passat2.0            1.777e-12   9.31e-13      1.910      0.056   -4.75e-14     3.6e-12
    Model_PassatDiesel           -6.9680      0.137    -50.840      0.000      -7.237      -6.699
    Model_PassatHighline         -6.9587      0.190    -36.709      0.000      -7.330      -6.587
    Model_Petra1.2              -15.5801      0.275    -56.573      0.000     -16.120     -15.040
    Model_PlatinumEtios       -1.697e-12   7.22e-13     -2.352      0.019   -3.11e-12   -2.82e-13
    Model_Polo1.0                -7.4980      0.193    -38.864      0.000      -7.876      -7.120
    Model_Polo1.2                -7.5363      0.121    -62.142      0.000      -7.774      -7.299
    Model_Polo1.5                -7.5771      0.124    -61.227      0.000      -7.820      -7.334
    Model_PoloDiesel             -7.5681      0.120    -63.329      0.000      -7.802      -7.334
    Model_PoloGT                 -7.4551      0.131    -56.957      0.000      -7.712      -7.199
    Model_PoloGTI                -7.6520      0.160    -47.958      0.000      -7.965      -7.339
    Model_PoloIPL             -8.871e-13   6.45e-13     -1.375      0.169   -2.15e-12    3.78e-13
    Model_PoloPetrol             -7.5474      0.119    -63.644      0.000      -7.780      -7.315
    Model_PulsePetrol            -9.9911      0.218    -45.832      0.000     -10.419      -9.564
    Model_PulseRxL              -10.0897      0.179    -56.406      0.000     -10.440      -9.739
    Model_Punto1.2              -15.4665      0.285    -54.319      0.000     -16.025     -14.908
    Model_Punto1.3              -15.2772      0.279    -54.698      0.000     -15.825     -14.730
    Model_Punto1.4              -15.4527      0.280    -55.182      0.000     -16.002     -14.904
    Model_PuntoEVO             6.954e-13   9.99e-13      0.696      0.486   -1.26e-12    2.65e-12
    Model_Q32.0                  -6.7812      0.126    -53.993      0.000      -7.027      -6.535
    Model_Q32012-2015            -6.7844      0.123    -55.067      0.000      -7.026      -6.543
    Model_Q330                -4.369e-12   1.06e-12     -4.103      0.000   -6.46e-12   -2.28e-12
    Model_Q335                   -6.7832      0.127    -53.427      0.000      -7.032      -6.534
    Model_Q52.0                  -6.4911      0.118    -55.049      0.000      -6.722      -6.260
    Model_Q52008-2012            -6.5431      0.123    -53.283      0.000      -6.784      -6.302
    Model_Q53.0                  -6.5681      0.152    -43.231      0.000      -6.866      -6.270
    Model_Q530                   -6.3940      0.124    -51.564      0.000      -6.637      -6.151
    Model_Q73.0                  -6.4069      0.122    -52.635      0.000      -6.646      -6.168
    Model_Q735                   -6.3184      0.133    -47.499      0.000      -6.579      -6.058
    Model_Q74.2                  -6.4492      0.140    -46.133      0.000      -6.723      -6.175
    Model_Q745                   -6.0710      0.141    -43.205      0.000      -6.346      -5.795
    Model_QualisFS               -6.1899      0.198    -31.232      0.000      -6.578      -5.801
    Model_QualisFleet            -6.2484      0.202    -30.868      0.000      -6.645      -5.852
    Model_QualisRS               -6.1888      0.202    -30.568      0.000      -6.586      -5.792
    Model_QuantoC2               -4.9564      0.170    -29.237      0.000      -5.289      -4.624
    Model_QuantoC4               -4.7584      0.168    -28.282      0.000      -5.088      -4.429
    Model_QuantoC6               -4.7912      0.168    -28.493      0.000      -5.121      -4.461
    Model_QuantoC8               -4.7291      0.129    -36.567      0.000      -4.983      -4.476
    Model_R-ClassR350            -5.5495      0.133    -41.697      0.000      -5.810      -5.289
    Model_RS5Coupe               -6.2802      0.164    -38.253      0.000      -6.602      -5.958
    Model_Rapid1.5               -5.9734      0.106    -56.206      0.000      -6.182      -5.765
    Model_Rapid1.6               -6.0211      0.103    -58.223      0.000      -6.224      -5.818
    Model_Rapid2013-2016       9.712e-13   8.39e-13      1.158      0.247   -6.74e-13    2.62e-12
    Model_RapidLeisure         1.033e-12   9.85e-13      1.048      0.295   -8.99e-13    2.96e-12
    Model_RapidUltima            -6.3048      0.180    -35.032      0.000      -6.658      -5.952
    Model_RediGO                -28.6261      0.466    -61.479      0.000     -29.539     -27.713
    Model_RenaultLogan           -4.6565      0.167    -27.842      0.000      -4.984      -4.329
    Model_RitzAT              -2.691e-13   5.82e-13     -0.463      0.644   -1.41e-12    8.71e-13
    Model_RitzLDi                -0.0513      0.172     -0.297      0.766      -0.389       0.287
    Model_RitzLXI                 0.0873      0.221      0.394      0.693      -0.347       0.521
    Model_RitzLXi                -0.0871      0.192     -0.453      0.650      -0.464       0.290
    Model_RitzVDI                -0.0707      0.222     -0.319      0.750      -0.506       0.364
    Model_RitzVDi                -0.0407      0.161     -0.252      0.801      -0.357       0.275
    Model_RitzVXI                -0.0744      0.176     -0.424      0.672      -0.419       0.270
    Model_RitzVXi                -0.0806      0.170     -0.475      0.635      -0.413       0.252
    Model_RitzZDi                -0.0922      0.192     -0.479      0.632      -0.469       0.285
    Model_RitzZXI              8.212e-13   5.89e-13      1.395      0.163   -3.33e-13    1.98e-12
    Model_RitzZXi                 0.0438      0.222      0.198      0.843      -0.391       0.478
    Model_RoverDiscovery        -49.2817      0.784    -62.827      0.000     -50.820     -47.744
    Model_RoverFreelander       -49.4741      0.779    -63.538      0.000     -51.001     -47.947
    Model_RoverRange            -48.8920      0.778    -62.862      0.000     -50.417     -47.367
    Model_S-Class280          -2.875e-12   9.05e-13     -3.176      0.002   -4.65e-12    -1.1e-12
    Model_S-Class320             -5.2182      0.172    -30.265      0.000      -5.556      -4.880
    Model_S-ClassS               -5.3763      0.176    -30.618      0.000      -5.721      -5.032
    Model_S-CrossAlpha        -8.134e-13   9.46e-13     -0.860      0.390   -2.67e-12    1.04e-12
    Model_S-CrossDelta            0.3236      0.222      1.456      0.145      -0.112       0.759
    Model_S-CrossZeta          7.269e-13   7.45e-13      0.975      0.330   -7.35e-13    2.19e-12
    Model_S60D3                3.334e-13   6.58e-13      0.506      0.613   -9.57e-13    1.62e-12
    Model_S60D4                 -21.9055      0.362    -60.454      0.000     -22.616     -21.195
    Model_S60D5                 -21.9008      0.374    -58.601      0.000     -22.634     -21.168
    Model_S802006-2013          -22.4719      0.372    -60.402      0.000     -23.201     -21.742
    Model_S80D5               -2.341e-12   7.82e-13     -2.993      0.003   -3.88e-12   -8.07e-13
    Model_SClass                 -5.4716      0.102    -53.520      0.000      -5.672      -5.271
    Model_SCross                  0.4283      0.173      2.479      0.013       0.090       0.767
    Model_SL-ClassSL             -5.0114      0.190    -26.389      0.000      -5.384      -4.639
    Model_SLC43                  -5.2052      0.153    -34.066      0.000      -5.505      -4.906
    Model_SLK-Class55            -4.8603      0.191    -25.506      0.000      -5.234      -4.487
    Model_SLK-ClassSLK           -5.2134      0.149    -35.046      0.000      -5.505      -4.922
    Model_SX4Green               -0.1877      0.224     -0.838      0.402      -0.627       0.252
    Model_SX4S                    0.4418      0.171      2.591      0.010       0.107       0.776
    Model_SX4VDI                  0.1712      0.222      0.772      0.440      -0.264       0.606
    Model_SX4Vxi                  0.0339      0.167      0.203      0.839      -0.293       0.361
    Model_SX4ZDI                  0.1272      0.176      0.723      0.469      -0.217       0.472
    Model_SX4ZXI                  0.1013      0.170      0.596      0.551      -0.232       0.435
    Model_SX4Zxi                  0.0269      0.176      0.153      0.878      -0.317       0.371
    Model_SafariDICOR            -4.4303      0.176    -25.135      0.000      -4.776      -4.085
    Model_SafariStorme           -4.0820      0.113    -36.283      0.000      -4.303      -3.861
    Model_Sail1.2                -9.2441      0.183    -50.444      0.000      -9.603      -8.885
    Model_SailHatchback          -9.5374      0.168    -56.691      0.000      -9.867      -9.208
    Model_SailLT                 -9.6807      0.212    -45.563      0.000     -10.097      -9.264
    Model_SantaFe                -2.8409      0.089    -31.743      0.000      -3.016      -2.665
    Model_SantroAT               -3.6891      0.164    -22.455      0.000      -4.011      -3.367
    Model_SantroD                -3.9024      0.161    -24.238      0.000      -4.218      -3.587
    Model_SantroDX             4.293e-13   9.89e-13      0.434      0.664   -1.51e-12    2.37e-12
    Model_SantroGLS              -3.9380      0.095    -41.662      0.000      -4.123      -3.753
    Model_SantroGS               -3.6835      0.123    -29.847      0.000      -3.925      -3.441
    Model_SantroLP               -3.5956      0.162    -22.174      0.000      -3.913      -3.278
    Model_SantroLS               -4.2427      0.164    -25.836      0.000      -4.565      -3.921
    Model_SantroXing             -3.9233      0.059    -66.173      0.000      -4.039      -3.807
    Model_ScalaDiesel            -9.7957      0.218    -44.930      0.000     -10.223      -9.368
    Model_ScalaRxL              -10.1425      0.187    -54.335      0.000     -10.508      -9.776
    Model_Scorpio1.99            -4.0953      0.132    -31.114      0.000      -4.353      -3.837
    Model_Scorpio2.6             -4.3486      0.098    -44.514      0.000      -4.540      -4.157
    Model_Scorpio2009-2014       -4.1909      0.105    -39.730      0.000      -4.398      -3.984
    Model_ScorpioDX              -4.1549      0.167    -24.817      0.000      -4.483      -3.827
    Model_ScorpioLX              -4.3753      0.129    -33.908      0.000      -4.628      -4.122
    Model_ScorpioS10             -3.9778      0.116    -34.373      0.000      -4.205      -3.751
    Model_ScorpioS2           -1.983e-12   7.01e-13     -2.827      0.005   -3.36e-12   -6.08e-13
    Model_ScorpioS4              -4.2234      0.131    -32.255      0.000      -4.480      -3.967
    Model_ScorpioS6              -3.9975      0.116    -34.378      0.000      -4.226      -3.770
    Model_ScorpioS8              -4.0309      0.133    -30.228      0.000      -4.292      -3.769
    Model_ScorpioSLE             -4.2456      0.099    -42.817      0.000      -4.440      -4.051
    Model_ScorpioSLX           7.218e-13   6.18e-13      1.167      0.243   -4.91e-13    1.93e-12
    Model_ScorpioVLX             -4.1591      0.086    -48.565      0.000      -4.327      -3.991
    Model_Siena1.2              -15.6515      0.276    -56.758      0.000     -16.192     -15.111
    Model_Sonata2.4           -3.116e-13   9.35e-13     -0.333      0.739   -2.15e-12    1.52e-12
    Model_SonataEmbera           -3.0664      0.124    -24.821      0.000      -3.309      -2.824
    Model_SonataGOLD             -3.6291      0.161    -22.552      0.000      -3.945      -3.314
    Model_SonataTransform       1.18e-12   5.49e-13      2.151      0.032    1.04e-13    2.26e-12
    Model_Spark1.0               -9.6900      0.169    -57.406      0.000     -10.021      -9.359
    Model_SsangyongRexton        -3.9533      0.091    -43.240      0.000      -4.133      -3.774
    Model_SumoDX                 -4.2895      0.197    -21.759      0.000      -4.676      -3.903
    Model_SumoDelux              -4.2354      0.175    -24.177      0.000      -4.579      -3.892
    Model_SumoEX                 -4.6838      0.182    -25.702      0.000      -5.041      -4.327
    Model_Sunny2011-2014        -13.3117      0.211    -63.134      0.000     -13.725     -12.898
    Model_SunnyDiesel           -13.1698      0.255    -51.587      0.000     -13.670     -12.669
    Model_SunnyXE             -4.512e-13   8.05e-13     -0.560      0.575   -2.03e-12    1.13e-12
    Model_SunnyXL               -12.9906      0.254    -51.168      0.000     -13.488     -12.493
    Model_SunnyXV               -13.1788      0.231    -57.144      0.000     -13.631     -12.727
    Model_Superb1.8              -5.6058      0.122    -45.888      0.000      -5.845      -5.366
    Model_Superb2.5           -4.392e-13   5.63e-13     -0.780      0.435   -1.54e-12    6.65e-13
    Model_Superb2.8              -5.6891      0.138    -41.244      0.000      -5.960      -5.419
    Model_Superb2009-2014        -4.9904      0.182    -27.432      0.000      -5.347      -4.634
    Model_Superb3.6            2.376e-12   8.27e-13      2.873      0.004    7.54e-13       4e-12
    Model_SuperbAmbition         -5.5036      0.178    -30.850      0.000      -5.853      -5.154
    Model_SuperbElegance         -5.4992      0.102    -53.887      0.000      -5.699      -5.299
    Model_SuperbL&K              -5.0246      0.148    -33.986      0.000      -5.315      -4.735
    Model_SuperbStyle            -5.4082      0.124    -43.545      0.000      -5.652      -5.165
    Model_Swift1.3                0.1239      0.167      0.741      0.459      -0.204       0.452
    Model_SwiftAMT                0.0855      0.193      0.442      0.659      -0.294       0.465
    Model_SwiftDDiS               0.1151      0.182      0.633      0.527      -0.242       0.472
    Model_SwiftDzire              0.1869      0.158      1.180      0.238      -0.124       0.498
    Model_SwiftLDI                0.0474      0.171      0.278      0.781      -0.287       0.382
    Model_SwiftLXI                0.2370      0.181      1.309      0.191      -0.118       0.592
    Model_SwiftLXi                0.0941      0.221      0.426      0.670      -0.339       0.527
    Model_SwiftLdi                0.1085      0.173      0.628      0.530      -0.230       0.447
    Model_SwiftLxi               -0.1151      0.176     -0.656      0.512      -0.459       0.229
    Model_SwiftRS                 0.1850      0.222      0.835      0.404      -0.249       0.619
    Model_SwiftVDI                0.1551      0.159      0.975      0.330      -0.157       0.467
    Model_SwiftVDi            -2.104e-12   5.37e-13     -3.918      0.000   -3.16e-12   -1.05e-12
    Model_SwiftVVT                0.1283      0.193      0.666      0.506      -0.249       0.506
    Model_SwiftVXI                0.0927      0.161      0.577      0.564      -0.223       0.408
    Model_SwiftVXi                0.0176      0.182      0.097      0.923      -0.338       0.374
    Model_SwiftVdi                0.1839      0.176      1.047      0.295      -0.160       0.528
    Model_SwiftZDI                0.1386      0.222      0.625      0.532      -0.296       0.574
    Model_SwiftZDi                0.2342      0.166      1.410      0.159      -0.092       0.560
    Model_SwiftZXI                0.2081      0.172      1.208      0.227      -0.130       0.546
    Model_TT2.0                  -6.4847      0.190    -34.170      0.000      -6.857      -6.113
    Model_TT40                   -5.9077      0.182    -32.539      0.000      -6.264      -5.552
    Model_TUV300                 -4.5025      0.098    -46.041      0.000      -4.694      -4.311
    Model_TaveraLS               -9.3899      0.236    -39.789      0.000      -9.853      -8.927
    Model_TaveraLT               -8.8982      0.226    -39.349      0.000      -9.342      -8.455
    Model_Teana230jM          -1.144e-12   1.03e-12     -1.113      0.266   -3.16e-12    8.71e-13
    Model_TeanaXV               -12.8632      0.262    -49.188      0.000     -13.376     -12.351
    Model_TerranoXL             -13.0631      0.214    -60.928      0.000     -13.483     -12.643
    Model_TerranoXV             -13.0035      0.216    -60.286      0.000     -13.426     -12.581
    Model_TharCRDe               -4.2702      0.116    -36.950      0.000      -4.497      -4.044
    Model_TharDI                 -4.5283      0.172    -26.377      0.000      -4.865      -4.192
    Model_Tiago1.2               -4.7574      0.102    -46.745      0.000      -4.957      -4.558
    Model_TiagoAMT             1.262e-12   5.72e-13      2.205      0.027     1.4e-13    2.38e-12
    Model_TiagoWizz            3.105e-12   6.59e-13      4.714      0.000    1.81e-12     4.4e-12
    Model_Tigor1.05              -4.4467      0.173    -25.706      0.000      -4.786      -4.108
    Model_Tigor1.2               -4.6836      0.134    -34.861      0.000      -4.947      -4.420
    Model_TigorXE             -1.755e-12   1.03e-12     -1.698      0.090   -3.78e-12    2.72e-13
    Model_Tiguan2.0              -6.3560      0.193    -32.863      0.000      -6.735      -5.977
    Model_Tucson2.0              -2.5898      0.166    -15.626      0.000      -2.915      -2.265
    Model_TucsonCRDi             -2.7089      0.170    -15.939      0.000      -3.042      -2.376
    Model_V40Cross              -21.8287      0.372    -58.666      0.000     -22.558     -21.099
    Model_V40D3                 -21.9063      0.364    -60.262      0.000     -22.619     -21.194
    Model_Vento1.2             5.416e-13    7.5e-13      0.722      0.470   -9.29e-13    2.01e-12
    Model_Vento1.5               -7.3840      0.122    -60.486      0.000      -7.623      -7.145
    Model_Vento1.6               -7.4029      0.128    -57.851      0.000      -7.654      -7.152
    Model_Vento2013-2015         -7.4946      0.159    -47.015      0.000      -7.807      -7.182
    Model_VentoDiesel            -7.4081      0.119    -62.234      0.000      -7.641      -7.175
    Model_VentoIPL               -7.6005      0.156    -48.819      0.000      -7.906      -7.295
    Model_VentoKonekt            -7.2200      0.190    -37.930      0.000      -7.593      -6.847
    Model_VentoMagnific          -7.3943      0.190    -38.885      0.000      -7.767      -7.021
    Model_VentoPetrol            -7.4415      0.120    -62.139      0.000      -7.676      -7.207
    Model_VentoSport             -7.3029      0.158    -46.343      0.000      -7.612      -6.994
    Model_VentoTSI             2.329e-12   8.15e-13      2.859      0.004    7.32e-13    3.93e-12
    Model_VentureEX              -4.7516      0.180    -26.436      0.000      -5.104      -4.399
    Model_Verito1.5              -4.7821      0.118    -40.456      0.000      -5.014      -4.550
    Model_Verna1.4               -3.3457      0.087    -38.623      0.000      -3.515      -3.176
    Model_Verna1.6               -3.3032      0.062    -52.940      0.000      -3.426      -3.181
    Model_VernaCRDi              -3.4769      0.069    -50.095      0.000      -3.613      -3.341
    Model_VernaSX                -3.3096      0.083    -39.665      0.000      -3.473      -3.146
    Model_VernaTransform         -3.6204      0.081    -44.424      0.000      -3.780      -3.461
    Model_VernaVTVT              -3.2389      0.079    -40.793      0.000      -3.395      -3.083
    Model_VernaXXi               -3.6599      0.162    -22.549      0.000      -3.978      -3.342
    Model_VernaXi                -3.8194      0.163    -23.415      0.000      -4.139      -3.500
    Model_VersaDX2                0.0337      0.228      0.148      0.883      -0.414       0.482
    Model_VitaraBrezza            0.3896      0.161      2.423      0.015       0.074       0.705
    Model_WR-VEdge               -4.1562      0.168    -24.778      0.000      -4.485      -3.827
    Model_WRVi-VTEC              -3.9613      0.129    -30.816      0.000      -4.213      -3.709
    Model_WagonR                 -0.1470      0.158     -0.930      0.352      -0.457       0.163
    Model_X-TrailSLX            -12.5613      0.233    -53.813      0.000     -13.019     -12.104
    Model_X1M                   -10.0304      0.190    -52.674      0.000     -10.404      -9.657
    Model_X1sDrive              -10.2879      0.164    -62.667      0.000     -10.610      -9.966
    Model_X1sDrive20d           -10.3399      0.166    -62.214      0.000     -10.666     -10.014
    Model_X1xDrive               -9.9258      0.218    -45.568      0.000     -10.353      -9.499
    Model_X3xDrive               -9.8674      0.177    -55.888      0.000     -10.214      -9.521
    Model_X3xDrive20d           -10.0053      0.171    -58.548      0.000     -10.340      -9.670
    Model_X3xDrive30d            -9.9052      0.218    -45.362      0.000     -10.333      -9.477
    Model_X52014-2019            -9.6152      0.183    -52.491      0.000      -9.974      -9.256
    Model_X53.0d                 -9.8939      0.177    -55.899      0.000     -10.241      -9.547
    Model_X5X5                   -9.4532      0.191    -49.367      0.000      -9.829      -9.078
    Model_X5xDrive               -9.6457      0.168    -57.344      0.000      -9.975      -9.316
    Model_X6xDrive               -9.3552      0.182    -51.413      0.000      -9.712      -8.998
    Model_X6xDrive30d            -9.5430      0.189    -50.375      0.000      -9.914      -9.172
    Model_XC60D4                -21.8700      0.362    -60.497      0.000     -22.579     -21.161
    Model_XC60D5                -21.8938      0.356    -61.539      0.000     -22.591     -21.196
    Model_XC902007-2015         -21.6101      0.381    -56.792      0.000     -22.356     -20.864
    Model_XE2.0L              -6.131e-13   4.43e-13     -1.384      0.166   -1.48e-12    2.56e-13
    Model_XEPortfolio         -4.061e-13   2.33e-13     -1.743      0.081   -8.63e-13    5.06e-14
    Model_XF2.0                 -19.7848      0.346    -57.132      0.000     -20.464     -19.106
    Model_XF2.2                 -19.8963      0.318    -62.619      0.000     -20.519     -19.273
    Model_XF3.0                 -20.1215      0.318    -63.354      0.000     -20.744     -19.499
    Model_XFAero                -19.5800      0.339    -57.700      0.000     -20.245     -18.915
    Model_XFDiesel              -20.0804      0.322    -62.355      0.000     -20.712     -19.449
    Model_XJ2.0L                -19.2596      0.348    -55.405      0.000     -19.941     -18.578
    Model_XJ3.0L                -19.4028      0.326    -59.570      0.000     -20.041     -18.764
    Model_XJ5.0                 -19.5345      0.344    -56.745      0.000     -20.210     -18.860
    Model_XUV300W8               -3.9942      0.171    -23.300      0.000      -4.330      -3.658
    Model_XUV500AT               -3.9524      0.093    -42.556      0.000      -4.134      -3.770
    Model_XUV500W10              -3.8637      0.087    -44.164      0.000      -4.035      -3.692
    Model_XUV500W4               -4.1634      0.116    -36.038      0.000      -4.390      -3.937
    Model_XUV500W6               -4.0692      0.090    -45.260      0.000      -4.245      -3.893
    Model_XUV500W7               -4.1544      0.171    -24.323      0.000      -4.489      -3.820
    Model_XUV500W8               -4.0323      0.077    -52.450      0.000      -4.183      -3.882
    Model_XUV500W9                -4e-13   3.43e-13     -1.167      0.243   -1.07e-12    2.72e-13
    Model_Xcent1.1               -3.6344      0.075    -48.770      0.000      -3.781      -3.488
    Model_Xcent1.2               -3.6668      0.068    -53.637      0.000      -3.801      -3.533
    Model_XenonXT                -4.5936      0.122    -37.769      0.000      -4.832      -4.355
    Model_XyloD2                 -5.0120      0.134    -37.493      0.000      -5.274      -4.750
    Model_XyloD4                 -4.6961      0.104    -45.327      0.000      -4.899      -4.493
    Model_XyloE2                 -4.7978      0.169    -28.452      0.000      -5.128      -4.467
    Model_XyloE4                 -4.5368      0.131    -34.708      0.000      -4.793      -4.281
    Model_XyloE8                 -4.5919      0.118    -39.042      0.000      -4.823      -4.361
    Model_XyloH4                 -4.3561      0.172    -25.343      0.000      -4.693      -4.019
    Model_YetiAmbition           -5.5909      0.130    -43.096      0.000      -5.845      -5.337
    Model_YetiElegance           -5.4991      0.130    -42.151      0.000      -5.755      -5.243
    Model_Z42009-2013            -9.5870      0.194    -49.443      0.000      -9.967      -9.207
    Model_ZenEstilo              -0.3639      0.166     -2.190      0.029      -0.690      -0.038
    Model_ZenLX                  -0.3016      0.192     -1.574      0.116      -0.677       0.074
    Model_ZenLXI               2.961e-17   1.73e-17      1.714      0.087   -4.26e-18    6.35e-17
    Model_ZenLXi                 -0.2961      0.181     -1.639      0.101      -0.650       0.058
    Model_ZenVX                   0.0019      0.221      0.009      0.993      -0.431       0.435
    Model_ZenVXI                 -0.2790      0.181     -1.541      0.123      -0.634       0.076
    Model_ZenVXi                 -0.1740      0.221     -0.788      0.431      -0.607       0.259
    Model_ZestQuadrajet          -4.6607      0.113    -41.395      0.000      -4.881      -4.440
    Model_ZestRevotron           -4.5122      0.099    -45.371      0.000      -4.707      -4.317
    Model_i10Asta                -3.5636      0.090    -39.566      0.000      -3.740      -3.387
    Model_i10Era                 -3.7607      0.065    -57.629      0.000      -3.889      -3.633
    Model_i10Magna               -3.6895      0.061    -60.302      0.000      -3.810      -3.570
    Model_i10Magna(O)            -3.6090      0.162    -22.231      0.000      -3.927      -3.291
    Model_i10Sportz              -3.7076      0.061    -60.603      0.000      -3.828      -3.588
    Model_i201.2                 -3.5059      0.064    -55.086      0.000      -3.631      -3.381
    Model_i201.4                 -3.5404      0.067    -52.771      0.000      -3.672      -3.409
    Model_i202015-2017           -3.5293      0.080    -44.362      0.000      -3.685      -3.373
    Model_i20Active              -3.4474      0.078    -44.056      0.000      -3.601      -3.294
    Model_i20Asta                -3.3911      0.065    -51.836      0.000      -3.519      -3.263
    Model_i20Diesel              -3.4774      0.166    -20.945      0.000      -3.803      -3.152
    Model_i20Era                 -3.6378      0.163    -22.263      0.000      -3.958      -3.317
    Model_i20Magna               -3.5601      0.066    -53.850      0.000      -3.690      -3.430
    Model_i20Sportz              -3.4857      0.064    -54.491      0.000      -3.611      -3.360
    Model_redi-GOS              -28.4620      0.466    -61.044      0.000     -29.376     -27.548
    Model_redi-GOT              -28.4673      0.450    -63.209      0.000     -29.350     -27.584
    ==============================================================================
    Omnibus:                      729.914   Durbin-Watson:                   1.976
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11194.465
    Skew:                          -0.350   Prob(JB):                         0.00
    Kurtosis:                      10.957   Cond. No.                     1.68e+21
    ==============================================================================
    
    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    [2] The smallest eigenvalue is 1.02e-32. This might indicate that there are
    strong multicollinearity problems or that the design matrix is singular.
    
    In [40]:
    get_model_score_adjusted_R2(olsmodel2)
    
    Adjusted R2 on training set :  0.9554174298292273
    Adjusted R2 on test set :  -2.9916319917660635e+83
    RMSE on training set :  2.1857914088403443
    RMSE on test set :  4.4582786612795244e+42
    
    Out[40]:
    [0.9554174298292273,
     -2.9916319917660635e+83,
     2.1857914088403443,
     4.4582786612795244e+42]
    In [92]:
    pval_filter = olsmod['pval']<= 0.05
    imp_vars = olsmod[pval_filter].index.tolist()
    
    # We are going to get overall varaibles (un-one-hot encoded varables) from categorical varaibles
    sig_var = []
    for col in imp_vars:
        if '' in col:
            first_part = col.split('_')[0]
            for c in cars_data.columns:
                if first_part in c and c not in sig_var :
                    sig_var.append(c)
    
                    
    start = '\033[1m'
    end = '\033[95m'
    print(start+ 'Most overall significant categorical variables of LINEAR REGRESSION  are ' +end,':\n', sig_var)
    
    Most overall significant categorical variables of LINEAR REGRESSION  are  :
     ['Year', 'Mileage', 'Power', 'kilometers_driven_log', 'Location', 'Fuel_Type', 'Transmission', 'Owner_Type', 'Brand', 'Model']
    
    In [42]:
    # Using Adjusted R2 resulted in a bad model, just like the R2.  We will drop both OLS models.
    

    Ridge / Lasso Regression¶

    Build Ridge / Lasso Regression similar to Linear Regression:

    https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

    Ridge

    In [165]:
    # Import Ridge/ Lasso Regression from sklearn
    from sklearn.linear_model import Ridge, Lasso
    
    In [166]:
    # Create a Ridge regression model
    ridge = Ridge(alpha=1.0)
    
    In [167]:
    # Fit Ridge regression model
    ridge.fit(X_train,y_train['price_log'])
    
    Out[167]:
    Ridge()
    In [168]:
    # Get score of the model
    ridge_score = get_model_score(ridge)
    
    R-square on training set :  0.9495426277022899
    R-square on test set :  0.9082463599330018
    RMSE on training set :  2.508411560689403
    RMSE on test set :  3.382079182882414
    
    In [169]:
    import numpy as np
    
    # Train the model
    ridge.fit(X_train, y_train)
    
    # Get the coefficients
    coefficients = ridge.coef_
    
    # Get the absolute values of the coefficients
    coef_abs = np.abs(coefficients)
    
    # Get the indices of the k largest absolute values
    k = 7
    most_important = np.argpartition(coef_abs, -k)[-k:]
    
    # Get the corresponding feature names
    most_important_features = [X_train.columns[i] for i in most_important]
    
    In [171]:
    #**Observations from results: _____**
    #RIDGE
    #a) R-square on training set: 0.9495426277022899 and R-square on test set: 0.9082463599330018 are indicating 
    #that the model is performing well on both the training and test sets. A high R-squared value (closer to 1) 
    #indicates that the model is explaining a large proportion of the variance in the data. 
    #The R-squared value for the test set is lower than the R-squared value for the training set, which is expected. 
    #In general, the test set score should be lower than the training set score #because the model has not seen the 
    #test data before.
    #b) RMSE on training set: 2.508411560689403 and RMSE on test set: 3.382079182882414 are indicating the error of the 
    #model on #both training and test set. RMSE (Root Mean Squared Error) is a measure of the difference between 
    #the predicted and actual #values. The lower the RMSE, the better the model.
    

    Lasso

    In [48]:
    #create lasso regression model 
    lasso=Lasso(alpha=1.0)
    
    In [49]:
    #Fit Lasso regression model
    lasso.fit(X_train,y_train['price_log'])
    
    Out[49]:
    Lasso()
    In [50]:
    # Get score of the model
    lasso_score = get_model_score(lasso)
    
    R-square on training set :  -3.4840811654275816
    R-square on test set :  0.2077302968471606
    RMSE on training set :  24.138711598481414
    RMSE on test set :  9.43820300687797
    

    Observations from results: _

    SUMMARY (For score details, see individual models)

    LINEAR REGRESSION Overall, all the models perform badly. The models are overfitting the training data and is not generalizing well to unseen data. Linear regression from scikit-learn and OLS from statsmodels are different implementations of linear regression, which resulted in different results. Scikit-learn uses the Ordinary Least Squares (OLS) method as the default implementation of linear regression, but statsmodels provides more options and detailed output for OLS models, including hypothesis testing, confidence intervals, and various statistical measures. The get_model_score function returned different values for both models because the underlying implementation is different.

    RIDGE Values are still high indicating that we still need to improve the model.

    LASSO Giving extremely bad results. Worst than all methods so far.

    Decision Tree and Random Forest¶

    Decision Tree¶

    https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html

    In [51]:
    # Import Decision tree for Regression from sklearn
    from sklearn.tree import DecisionTreeRegressor 
    
    In [52]:
    # Create a decision tree regression model, use random_state = 1
    dtree = DecisionTreeRegressor(random_state = 1) 
    
    In [53]:
    # Fit decision tree regression model
    dtree.fit(X_train, y_train['price_log'])
    
    Out[53]:
    DecisionTreeRegressor(random_state=1)
    In [54]:
    # Get score of the model
    Dtree_model = get_model_score(dtree)
    
    R-square on training set :  0.9999991628779447
    R-square on test set :  0.8046161304354573
    RMSE on training set :  0.010429698294569224
    RMSE on test set :  4.687023633168094
    

    Observations from results: _

    The model has a very high R-squared on the training set (0.9999991628779447) which indicates that the model is fitting the training data very well. However, the R-squared on the test set (0.8046161304354573) is significantly lower, indicating that the model is overfitting to the training data and not generalizing well to new, unseen data.

    The RMSE on the training set (0.010429698294569224) is also very low which confirms that the model is fitting the training data very well, however, the RMSE on the test set (4.687023633168094) is much higher, which indicates that the model is not performing well on unseen data.

    Overall, the model is overfitting to the training data.

    Print the importance of features in the tree building. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

    In [55]:
    print(pd.DataFrame(dtree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
    
                                   Imp
    Power                     0.629740
    Year                      0.231035
    Engine                    0.030364
    kilometers_driven_log     0.015019
    Mileage                   0.010386
    ...                            ...
    Model_PulsePetrol         0.000000
    Model_CLA200              0.000000
    Model_PoloIPL             0.000000
    Model_CLS-Class2006-2010  0.000000
    Model_Q735                0.000000
    
    [738 rows x 1 columns]
    
    In [56]:
    #plot graph of feature importances for Decision Tree for better analysis
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(dtree.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    Observations and insights: _

    Gini importance is a measure of the importance of each feature (predictor variable) in a decision tree or random forest model. It is calculated by measuring the decrease in the Gini impurity of the node when a feature is used to split the data, and averaging the results over all of the trees in the forest. Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Lower Gini impurity indicates a more pure subset of the data, and therefore a feature with a high Gini importance is considered to be more important in the prediction of the target variable.

    The feature with the highest score, "Power" in this case, is considered to be the most important feature in the model. The feature with the second highest score, "Year" in this case, is considered to be the second most important feature and so on.

    The low Gini importance values of "Engine", "kilometers_driven_log" and "Mileage" indicate that these features may not be as important in the prediction of the target variable as the other features.

    Random Forest¶

    https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

    In [57]:
    # Import Randomforest for Regression from sklearn
    from sklearn.ensemble import RandomForestRegressor
    
    In [58]:
    # Create a Randomforest regression model 
    clf = RandomForestRegressor(n_estimators=100)
    
    In [59]:
    # Fit Randomforest regression model
    clf.fit(X_train, y_train['price_log'])
    
    Out[59]:
    RandomForestRegressor()
    In [60]:
    # Get score of the model
    clf_model = get_model_score(clf)
    
    R-square on training set :  0.9834134401371122
    R-square on test set :  0.8758389940624158
    RMSE on training set :  1.468099577409237
    RMSE on test set :  3.736331500042392
    

    Observations and insights: _

    The R-squared values on the training and test sets are high and low, respectively. This suggests that the model is overfitting to the training data and not generalizing well to the test data.

    The RMSE values on the training and test sets are also low and high, respectively, further confirming overfitting.

    Feature Importance

    In [61]:
    # Print important features similar to decision trees
    print(pd.DataFrame(clf.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
    
                                Imp
    Power                  0.623214
    Year                   0.227837
    Engine                 0.030693
    kilometers_driven_log  0.016297
    Mileage                0.013841
    ...                         ...
    Model_Nexon1.2         0.000000
    Model_CR-V2.0          0.000000
    Model_NuvoSportN8      0.000000
    Model_VentoTSI         0.000000
    Model_Figo1.2P         0.000000
    
    [738 rows x 1 columns]
    
    In [143]:
    #plot graph of feature importances for Random Forest for better analysis
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(clf.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    Observations and insights: _

    Not much difference in values between Decision Tree and Random Forest. Both indicate overfitting.

    Hyperparameter Tuning: Decision Tree¶

    In [62]:
    #To tune a decision tree, we use the following parameters:
    
    # max_depth: The maximum depth of the tree. Increasing this value will make the model more complex, 
    #   while decreasing it will make the model less complex.
    
    # min_samples_split: The minimum number of samples required to split an internal node. 
    #   Increasing this value will make the model less complex, as it will require more samples to split a node.
    
    # min_samples_leaf: The minimum number of samples required to be at a leaf node. 
    #   Increasing this value will make the model less complex, 
    #   as it will require more samples to be present at a leaf node.
    
    # max_features controls the number of features that are considered when splitting a node.
    #   set max_features to "auto" - DEPRICATED,  so can't use
    #   set max_features to "sqrt", the algorithm will select the number of features equal to the square root of the total number of features.
    #   set max_features to "log2", the algorithm will select the number of features equal to log2(total number of features).
    #   set max_features is set to None, then all features will be considered when splitting a node.
    
    In [63]:
    #importing required libraries
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import recall_score, make_scorer
    
    # Choose the type of estimator 
    dtree_tuned = DecisionTreeRegressor(random_state = 1)
    
    # Grid of parameters to choose from
    # Check documentation for all the parametrs that the model takes and play with those
    parameters = {'splitter':["best","random"],
     #   'max_depth': [1, 3, 5, 7, 9, 11, 12, 15],
        'min_samples_leaf': [5, 10, 20, 25],
       # 'min_weight_fraction_leaf': [.5],
                  'max_features': [None],
                 }
    
    # Type of scoring used to compare parameter combinations
    scorer = 'neg_mean_squared_error'
    
    # Run the grid search
    grid_obj = GridSearchCV(estimator=dtree_tuned,param_grid=parameters, cv=10, verbose=1, scoring = scorer)
    
    grid_obj = grid_obj.fit(X_train,y_train)
    
    # Set the model to the best combination of parameters
    dtree_tuned = grid_obj.best_estimator_
    
    # Fit the best algorithm to the data
    dtree_tuned.fit(X_train,y_train['price_log'])
    
    Fitting 10 folds for each of 8 candidates, totalling 80 fits
    
    Out[63]:
    DecisionTreeRegressor(min_samples_leaf=10, random_state=1)
    In [64]:
    # Get score of tuned model
    dtree_tuned_model = get_model_score(dtree_tuned)
    
    R-square on training set :  0.8892250655242453
    R-square on test set :  0.7862076836289769
    RMSE on training set :  3.794006794630731
    RMSE on test set :  4.90285259664256
    

    Observations and insights: _

    -Working with min_samples_leaf resulted in negative R-squares and high RMSE. So I stopped tuning that parameter. -Increasing the number of features to consider when looking for the best split resulted in better values, but still showed overfitting. -I set max_features to None and that helped the model, but still shows overfitting. -Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations.

    Feature Importance

    In [65]:
    # Print important features of tuned decision tree similar to decision trees
    print(pd.DataFrame(dtree_tuned.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
    
                                   Imp
    Power                     0.678275
    Year                      0.245198
    Engine                    0.024863
    Mileage                   0.010325
    kilometers_driven_log     0.008252
    ...                            ...
    Model_Endeavour3.0L       0.000000
    Model_Endeavour3.2        0.000000
    Model_Endeavour4x2        0.000000
    Model_EndeavourHurricane  0.000000
    Model_redi-GOT            0.000000
    
    [738 rows x 1 columns]
    
    In [66]:
    #plot graph of feature importances for Tuned Decision Tree for better analysis
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(dtree_tuned.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    Feature Importance: Power is the most important variable for Price, followed by Year, Engine and Mileage.

    Hyperparameter Tuning: Random Forest¶

    In [67]:
    #Some Important Parameters
    
    # n_estimatorsint, default=100  --> The number of trees in the forest.Default=100
    # max_depthint, default=None --> The maximum depth of the tree.  If None, then nodes are expanded 
    #    until all leaves are pure or until all leaves contain less than min_samples_split samples.
    # min_samples_splitint or float, default=2 --> The minimum number of samples required to split an internal node
    # min_samples_leafint or float, default=1 --> The minimum number of samples required to be at a leaf node.
    # min_weight_fraction_leaffloat, default=0.0 --> The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. 
    # max_features{“sqrt”, “log2”, None}, int or float, default=1.0 The number of features to consider when looking for the best split
    # max_leaf_nodesint, default=None  -->  Grow trees with max_leaf_nodes in best-first fashion. 
    #  Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
    # min_impurity_decreasefloat, default=0.0 --> A node will be split if this split induces a decrease of the impurity greater than or equal to this value
    # max_samplesint or float, default=None --> If bootstrap is True, the number of samples to draw from X to train each base estimator.
    
    In [68]:
    # Choose the type of Regressor
    randomforest_tuned = RandomForestRegressor(random_state=1)
    
    # Define the parameters for Grid to choose from 
    parameters={'max_depth': [1, 2, 3, 5, 7, 9, 10, 11, 12],
        'min_samples_leaf': [5, 10, 20, 25],
         'max_features': [None]
      }
    
    # Check documentation for all the parametrs that the model takes and play with those:  see above
    # Type of scoring used to compare parameter combinations
    scorer = metrics.make_scorer(metrics.mean_absolute_error, greater_is_better=False)
    
    # Create classifier
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import GridSearchCV
    
    # Define the parameter grid
    param_grid = {
        'n_estimators': [50, 100, 200],
        'max_depth': [None, 5, 10],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'max_features': ['auto', 'sqrt', 'log2']
    }
    
    # Run the grid search
    grid_obj = GridSearchCV(estimator=randomforest_tuned,param_grid=parameters,
                            cv=10, verbose=1, scoring = scorer)
    grid_obj = grid_obj.fit(X_train,y_train)
    
    # Set the model to the best combination of parameters
    randomforest_tuned=grid_obj.best_estimator_
    
    # Fit the best algorithm to the data
    randomforest_tuned.fit(X_train,y_train['price_log'])
    
    Fitting 10 folds for each of 36 candidates, totalling 360 fits
    
    Out[68]:
    RandomForestRegressor(max_depth=12, max_features=None, min_samples_leaf=5,
                          random_state=1)
    In [69]:
    # Get score of tuned model
    randomforest_tuned_model = get_model_score(randomforest_tuned)
    
    R-square on training set :  0.9209426273014948
    R-square on test set :  0.8353974692885664
    RMSE on training set :  3.2051513501650515
    RMSE on test set :  4.3020063205316985
    
    In [70]:
    # Print important features of tuned decision tree similar to decision trees
    print(pd.DataFrame(randomforest_tuned.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
    
                                Imp
    Power                  0.656538
    Year                   0.240443
    Engine                 0.030342
    Mileage                0.014148
    kilometers_driven_log  0.012575
    ...                         ...
    Model_Fiesta1.5        0.000000
    Model_Fiesta1.6        0.000000
    Model_FiestaClassic    0.000000
    Model_FiestaDiesel     0.000000
    Model_redi-GOT         0.000000
    
    [738 rows x 1 columns]
    
    In [71]:
    #plot graph of feature importances for Random Forest for better visualization
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(randomforest_tuned.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    Observations and insights: _ -Overall the model looks good but is still overfitting. -Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations.

    Feature Importance: Power is the most important variable for Price, followed by Year, Engine and Mileage.

    KNN¶

    In [72]:
    # Create KNN Model 
    
    In [153]:
    from sklearn.neighbors import KNeighborsRegressor
    knn= KNeighborsRegressor()
    knn.fit(X_train, y_train["price_log"])
    get_model_score(knn)
    
    R-square on training set :  0.8924922061501867
    R-square on test set :  0.7860474214077737
    RMSE on training set :  3.7376387902590227
    RMSE on test set :  4.904689881687676
    
    Out[153]:
    [0.8924922061501867, 0.7860474214077737, 3.7376387902590227, 4.904689881687676]
    In [154]:
    knn_model = get_model_score(knn)
    
    R-square on training set :  0.8924922061501867
    R-square on test set :  0.7860474214077737
    RMSE on training set :  3.7376387902590227
    RMSE on test set :  4.904689881687676
    

    In a non-parametric model such as KNeighborsRegressor, the feature importances cannot be determined as easily as in a parametric model like linear regression. However, there are some methods we can use to get an understanding of which features are affecting the target variable:

    Feature Selection: We can use feature selection techniques like Recursive Feature Elimination (RFE) or SelectFromModel to find the most important features.

    Correlation: We can calculate the correlation between the features and the target variable and select the features with the highest correlation.

    Permutation Importance: We can use permutation importance to determine the feature importances by randomly shuffling the values of a single feature and measuring the impact on the model's performance.

    These are just some methods to understand the feature importances in a non-parametric model like KNeighborsRegressor. Note that these methods may not be as interpretable as the coefficients in a linear regression model, but they can still provide valuable insights into the features that are affecting the target variable.

    XGBoost¶

    Create XGBoost Model¶

    In [75]:
    import os
    import xgboost
    
    from xgboost import XGBRegressor
    xgb = xgboost.XGBRegressor()
    xgb.fit(X_train, y_train["price_log"])
    get_model_score(xgb)
    
    R-square on training set :  0.979436780937607
    R-square on test set :  0.9036778306183256
    RMSE on training set :  1.6346429452129905
    RMSE on test set :  3.290909336003646
    
    Out[75]:
    [0.979436780937607, 0.9036778306183256, 1.6346429452129905, 3.290909336003646]
    In [76]:
    xgb_model = get_model_score(xgb)
    
    R-square on training set :  0.979436780937607
    R-square on test set :  0.9036778306183256
    RMSE on training set :  1.6346429452129905
    RMSE on test set :  3.290909336003646
    
    In [77]:
    # Print important features of xgb
    print(pd.DataFrame(xgb.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
    
                               Imp
    Power                 0.233674
    Transmission_Manual   0.091888
    Fuel_Type_Diesel      0.042929
    Engine                0.037026
    Year                  0.034153
    ...                        ...
    Model_FiestaTitanium  0.000000
    Model_FiestaEXi       0.000000
    Model_Fiesta1.6       0.000000
    Model_Fiesta1.5       0.000000
    Model_redi-GOT        0.000000
    
    [738 rows x 1 columns]
    
    In [78]:
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(xgb.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    
    In [149]:
    # get original features
    original_features = list(X_train.columns)
    
    # exclude dummy variables
    dummy_features = [col for col in X_train.columns if col not in original_features]
    dummy_indices = [i for i, feature in enumerate(X_train.columns) if feature in dummy_features]
    
    # compute feature importances excluding dummy variables
    importance = xgb.feature_importances_
    importance[dummy_indices] = 0
    
    # plot the top 7 feature importances of original features only
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(importance, index=X_train.columns)
    feat_importances = feat_importances[original_features]
    feat_importances = feat_importances.sort_values(ascending=False)
    feat_importances[:7].plot(kind='barh')
    plt.show()
    

    Create XGBoost Tuned Model¶

    In [79]:
    # Hyperparameter tuning for XGBoost
    
    # from xgboost import XGBRegressor - done previously for xgboost
    # from sklearn.model_selection import GridSearchCV - done for random forest
    
    # Define the parameters to be tuned
    parameters_grid_xgb = {'learning_rate': [0.1, 0.01, 0.001], 
                  'max_depth': [3, 5, 7], 
                  'subsample': [0.6, 0.8, 1.0],
                  'gblinear': ['gblinear'],
                  'random_state' : [1],
                  'objective': ["reg:squarederror"],
                  'base_score': [0.2, 0.3, 0.5, 0.6]
                   }
    
    #parameters_grid_xgb = {'n_estimators': [100, 300, 500],
    #              'learning_rate': [0.1, 0.01, 0.001], 
    #              'max_depth': [3, 5, 7], 
    #              'subsample': [0.6, 0.8, 1.0]}
    
    # Create the grid search object
    xgb_tuned = XGBRegressor()
    grid_search = GridSearchCV(xgb_tuned, parameters_grid_xgb, cv=5, n_jobs=-1, verbose=2)
    
    # Fit the grid search to the data
    grid_search.fit(X_train, y_train["price_log"])
    
    # Train a new XGBoost model with the best hyperparameters
    xgb_tuned = xgboost.XGBRegressor(max_depth=grid_search.best_params_['max_depth'],  
                                 learning_rate=grid_search.best_params_['learning_rate'])
    xgb_tuned.fit(X_train, y_train["price_log"])
    
    # Print the best parameters and the best score
    print("Best parameters: ", grid_search.best_params_)
    print("Best score: ", grid_search.best_score_)
    
    Fitting 5 folds for each of 108 candidates, totalling 540 fits
    [12:57:51] WARNING: C:/buildkite-agent/builds/buildkite-windows-cpu-autoscaling-group-i-08de971ced8a8cdc6-1/xgboost/xgboost-ci-windows/src/learner.cc:767: 
    Parameters: { "gblinear" } are not used.
    
    Best parameters:  {'base_score': 0.2, 'gblinear': 'gblinear', 'learning_rate': 0.1, 'max_depth': 7, 'objective': 'reg:squarederror', 'random_state': 1, 'subsample': 0.6}
    Best score:  0.9437869065135688
    
    In [80]:
    xgb_tuned_model = get_model_score(xgb_tuned)
    
    R-square on training set :  0.9753569578882522
    R-square on test set :  0.900043951077762
    RMSE on training set :  1.7894703752028756
    RMSE on test set :  3.3524115677680077
    
    In [81]:
    print(xgb_tuned.feature_importances_)
    
    [4.82342169e-02 3.04709561e-03 2.69879419e-02 2.22238347e-01
     4.91382927e-03 2.49591423e-03 4.70170705e-03 1.37657986e-03
     4.91037639e-03 1.67038862e-03 5.50906779e-03 2.65574083e-03
     1.27263251e-03 7.44793704e-03 1.92655041e-03 2.90573016e-03
     1.81476008e-02 0.00000000e+00 1.58035532e-02 4.96353023e-02
     1.48158043e-03 2.08656117e-03 4.32528974e-03 6.34489348e-03
     1.12624569e-02 0.00000000e+00 6.01363275e-03 9.86020779e-04
     3.19298799e-03 0.00000000e+00 3.20846494e-03 1.80704109e-02
     5.44862822e-03 0.00000000e+00 2.58290302e-03 1.68080023e-03
     0.00000000e+00 1.30774146e-02 1.46347173e-02 2.51906924e-03
     1.25941569e-02 1.33351181e-02 6.44829241e-04 2.77831568e-03
     2.87809037e-03 3.77682899e-03 1.17079755e-02 0.00000000e+00
     1.46692265e-02 7.55658373e-03 7.60357082e-03 3.30722588e-03
     0.00000000e+00 3.23502510e-03 3.90503160e-03 0.00000000e+00
     1.39414729e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 9.67784412e-03 2.74982816e-03 0.00000000e+00
     0.00000000e+00 2.30923691e-03 6.73454662e-04 5.18884801e-04
     0.00000000e+00 0.00000000e+00 6.30957715e-04 2.56602117e-03
     2.03107629e-04 0.00000000e+00 1.23444432e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 2.29360070e-03 1.19749375e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 5.75470319e-03
     0.00000000e+00 0.00000000e+00 2.55154795e-03 0.00000000e+00
     0.00000000e+00 2.82438501e-04 6.69653527e-05 0.00000000e+00
     2.32550967e-03 0.00000000e+00 0.00000000e+00 4.99604130e-03
     1.43472548e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.10884127e-03 0.00000000e+00 5.44989074e-04 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 2.51894956e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 1.81735586e-03
     7.10084569e-05 1.00438215e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     2.82172789e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 3.95046169e-04 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 2.26213061e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 5.28310891e-04
     1.00722631e-04 0.00000000e+00 0.00000000e+00 1.03136653e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 8.90382973e-04 2.81361081e-02 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 1.96935912e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 7.47300265e-03 0.00000000e+00
     0.00000000e+00 8.63473222e-04 1.85798306e-03 1.76364940e-03
     0.00000000e+00 0.00000000e+00 2.28682905e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 7.75345718e-04 0.00000000e+00 3.24704306e-04
     0.00000000e+00 0.00000000e+00 0.00000000e+00 3.59659921e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 2.00537872e-03 1.24787390e-02
     0.00000000e+00 0.00000000e+00 0.00000000e+00 6.75287098e-04
     2.32976628e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 3.85230035e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.13905477e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 7.66038662e-04 2.98682571e-04 0.00000000e+00
     0.00000000e+00 3.70501366e-05 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.48410629e-03 0.00000000e+00 7.42063159e-04 4.01668996e-03
     0.00000000e+00 0.00000000e+00 1.39436114e-03 2.28655594e-03
     1.89668022e-03 0.00000000e+00 1.62448827e-03 1.18555792e-03
     0.00000000e+00 9.60857491e-04 3.08321207e-03 4.04560589e-04
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 1.36131479e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 6.45984430e-04 1.34100218e-03
     1.91785907e-03 7.30057072e-04 0.00000000e+00 1.39662554e-03
     9.38368321e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
     3.07860924e-03 2.53717299e-03 0.00000000e+00 0.00000000e+00
     1.05796766e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 1.40830374e-03 0.00000000e+00 0.00000000e+00
     9.51308990e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 3.35676712e-03 2.46688724e-03
     1.01643393e-03 0.00000000e+00 4.00162907e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 1.60315854e-03 0.00000000e+00
     0.00000000e+00 8.83668312e-04 0.00000000e+00 0.00000000e+00
     1.17737136e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.84921781e-03 0.00000000e+00 1.61317724e-03 1.47309329e-03
     0.00000000e+00 4.56564630e-05 1.61031948e-03 0.00000000e+00
     0.00000000e+00 6.90729183e-04 0.00000000e+00 0.00000000e+00
     1.49398169e-03 0.00000000e+00 5.19098539e-04 0.00000000e+00
     0.00000000e+00 3.19659058e-03 2.21338160e-02 1.75125932e-03
     0.00000000e+00 8.48362542e-05 0.00000000e+00 0.00000000e+00
     3.14793061e-03 3.31128744e-04 3.52563336e-03 5.20016765e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 1.54570665e-03 1.43168843e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 3.60956241e-04
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 1.55030226e-03 4.71756735e-04
     1.16079347e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     2.29868665e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.24912025e-04 2.26553320e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 2.55203433e-03 0.00000000e+00
     0.00000000e+00 4.01135476e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 3.03552975e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 4.83057083e-04
     0.00000000e+00 2.39945762e-03 0.00000000e+00 0.00000000e+00
     1.79978480e-04 0.00000000e+00 0.00000000e+00 4.49185632e-03
     2.81818473e-04 3.24843451e-04 0.00000000e+00 1.06444373e-03
     4.38893773e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 4.56465859e-05
     0.00000000e+00 2.07749358e-03 1.52391335e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.42667186e-03 0.00000000e+00 2.54524848e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 8.22352595e-04
     0.00000000e+00 1.31638569e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 5.77138271e-04 0.00000000e+00
     8.10846541e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.76873372e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 3.23613902e-04 0.00000000e+00 0.00000000e+00
     7.83919764e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     1.77468476e-03 9.57749435e-04 1.24305312e-03 1.62725733e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 1.08214340e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     2.09035097e-05 0.00000000e+00 0.00000000e+00 1.72831630e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 8.84039211e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 1.90723955e-03 1.51019916e-03 1.04899157e-03
     1.22241059e-03 0.00000000e+00 0.00000000e+00 2.76545499e-04
     0.00000000e+00 6.68147160e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 3.40018608e-03
     0.00000000e+00 5.53837651e-03 0.00000000e+00 9.96899791e-04
     0.00000000e+00 0.00000000e+00 6.72845170e-04 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 4.74646949e-04
     6.61463244e-04 5.01455856e-04 0.00000000e+00 1.12606329e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 1.85771275e-03
     9.98617732e-04 0.00000000e+00 0.00000000e+00 1.44963630e-03
     0.00000000e+00 8.87809496e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 4.11549292e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 2.43721437e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 9.76809068e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 9.80447046e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 1.51952624e-03 0.00000000e+00 0.00000000e+00
     1.24181178e-03 0.00000000e+00 2.24296725e-03 1.52327179e-03
     0.00000000e+00 0.00000000e+00 1.85737829e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00 7.04878243e-04 3.22350999e-03
     0.00000000e+00 1.49491534e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 1.16664683e-03 0.00000000e+00
     7.29298044e-04 8.62892950e-04 0.00000000e+00 0.00000000e+00
     3.19078070e-04 0.00000000e+00 4.23092657e-04 0.00000000e+00
     0.00000000e+00 1.15009432e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 3.21566855e-04 0.00000000e+00 0.00000000e+00
     9.21438041e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 6.52769255e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 4.43862891e-03 1.71486195e-03
     0.00000000e+00 5.85608999e-04 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 9.50066315e-04 0.00000000e+00
     0.00000000e+00 1.18819252e-03 0.00000000e+00 0.00000000e+00
     4.95065062e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
     8.30489909e-04 0.00000000e+00 0.00000000e+00 1.66947639e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 7.16061913e-04 0.00000000e+00
     0.00000000e+00 0.00000000e+00 4.12203575e-04 2.16467259e-03
     0.00000000e+00 1.36228057e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 6.59508747e-04 1.04799413e-03
     0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
     0.00000000e+00 0.00000000e+00 2.73426645e-03 1.43703620e-03
     1.55118993e-03 3.12204729e-03 0.00000000e+00 7.89886224e-04
     0.00000000e+00 0.00000000e+00 0.00000000e+00 8.46484967e-04
     1.18166977e-03 1.39140326e-03 0.00000000e+00 0.00000000e+00
     0.00000000e+00 2.87603092e-04 1.06549554e-03 0.00000000e+00
     0.00000000e+00 5.14614535e-03 0.00000000e+00 5.70750854e-05
     0.00000000e+00 0.00000000e+00 0.00000000e+00 2.26380792e-03
     6.66811713e-04 0.00000000e+00 5.41912916e-04 1.59671612e-03
     0.00000000e+00 0.00000000e+00 1.76820485e-03 0.00000000e+00
     0.00000000e+00 0.00000000e+00]
    
    In [82]:
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(xgb_tuned.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    ADABoost¶

    In [83]:
    from sklearn.ensemble import AdaBoostRegressor
    from sklearn.datasets import make_regression
    
    # define model
    ada_regr = AdaBoostRegressor (random_state= 0) 
    
    # Fitting the model
    ada_regr.fit(X_train, y_train['price_log'])
    AdaBoostRegressor(n_estimators=100, random_state= 0)
    
    # Model Performance on the test data
    ada_score = get_model_score(ada_regr)
    
    R-square on training set :  0.697458964788525
    R-square on test set :  0.6681917030759577
    RMSE on training set :  6.270028252038547
    RMSE on test set :  6.107963185948226
    
    In [84]:
    #plot graph of feature importances for better visualization
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(ada_regr.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    GradientBoost¶

    In [85]:
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.datasets import make_regression
    
    # define model
    gradient_reg = GradientBoostingRegressor(random_state=0)
    
    # Fitting the model
    gradient_reg.fit(X_train, y_train['price_log'])
    GradientBoostingRegressor(random_state=0)
    
    # Model Performance on the test data
    gradient_score = get_model_score(gradient_reg)
    
    R-square on training set :  0.916006981282117
    R-square on test set :  0.8558999137194322
    RMSE on training set :  3.3036874114402477
    RMSE on test set :  4.025176333238123
    
    In [86]:
    #plot graph of feature importances for better visualization
    
    plt.figure(figsize = (12,8))
    feat_importances = pd.Series(gradient_reg.feature_importances_, index=X.columns)
    feat_importances.nlargest(20).plot(kind='barh')
    plt.show()
    

    Comparison of Models¶

    Observations and insights: __

    In [87]:
    # Defining list of models you have trained
    
    #models = [lr, olsmodel1, olsmodel2, dtree, ridge, dtree_tuned,clf, randomforest_tuned,knn,xgb,ada_regr,gradient_reg]
    models = [lr,ridge,dtree,dtree_tuned, clf,randomforest_tuned,knn,xgb,xgb_tuned, ada_regr,gradient_reg]
    
    
    # Defining empty lists to add train and test results
    r2_train = []
    r2_test = []
    rmse_train = []
    rmse_test = []
    
    # Looping through all the models to get the rmse and r2 scores
    for model in models:
        
        # Accuracy score
        j = get_model_score(model, False)
        
        r2_train.append(j[0])
        
        r2_test.append(j[1])
        
        rmse_train.append(j[2])
        
        rmse_test.append(j[3])
    
    In [88]:
    # We exclude OLS (R2 and Adjusted R2) and Lasso from the comparison as they are not good contenders for the model
    
    # comparison_frame = pd.DataFrame({'Model':['Linear Regression','OLS - R2', 'OLS - AdjR2','Decision Tree', 'Ridge','Tuned Decision Tree','Tuned Random Forest','KNN','XGBoost','ADABoost', 'GradiantBoost'], 
    #                                         'Train_r2': r2_train,'Test_r2': r2_test,
    #                                         'Train_RMSE': rmse_train,'Test_RMSE': rmse_test}) 
    
    comparison_frame = pd.DataFrame({'Model':['Linear Regression','Ridge','Decision Tree', 'Tuned Decision Tree','Random Forest','Tuned Random Forest','KNN','XGBoost','XGBoost Tuned','ADABoost', 'GradiantBoost'], 
                                              'Train_r2': r2_train,'Test_r2': r2_test,
                                              'Train_RMSE': rmse_train,'Test_RMSE': rmse_test})                         
                            
    comparison_frame
    
    Out[88]:
    Model Train_r2 Test_r2 Train_RMSE Test_RMSE
    0 Linear Regression 0.963233 0.888446 2.185791 3.541570
    1 Ridge 0.952489 0.912363 2.484690 3.139037
    2 Decision Tree 0.999999 0.804616 0.010430 4.687024
    3 Tuned Decision Tree 0.889225 0.786208 3.794007 4.902853
    4 Random Forest 0.983413 0.875839 1.468100 3.736332
    5 Tuned Random Forest 0.920943 0.835397 3.205151 4.302006
    6 KNN 0.892492 0.786047 3.737639 4.904690
    7 XGBoost 0.979437 0.903678 1.634643 3.290909
    8 XGBoost Tuned 0.975357 0.900044 1.789470 3.352412
    9 ADABoost 0.697459 0.668192 6.270028 6.107963
    10 GradiantBoost 0.916007 0.855900 3.303687 4.025176
    In [89]:
    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(15,5))
    
    # Set the width of the bar
    barWidth = 0.4
     
    # Set the position of the bars
    bar1 = np.arange(len(comparison_frame))
    bar2 = [x + barWidth for x in bar1]
    
    # Create the bars for R2
    colors = ['blue', 'red', 'green', 'yellow']
    plt.bar(bar1, comparison_frame['Train_r2'], width=barWidth, edgecolor='black', label='Train R-squared', color=[0.68, 0.85, 0.90])
    plt.bar(bar2, comparison_frame['Test_r2'], width=barWidth, edgecolor='black', label='Test R-squared', color=[1.00, 0.80, 0.60])
    
    # Create the bars for RMSE
    plt.bar(bar1, comparison_frame['Train_RMSE'], width=barWidth, edgecolor='black', label='Train RMSE', bottom=comparison_frame['Train_r2'])
    plt.bar(bar2, comparison_frame['Test_RMSE'], width=barWidth, edgecolor='black', label='Test RMSE', bottom=comparison_frame['Test_r2'])
    
    # Add axis labels and a title
    plt.xlabel('Model')
    plt.ylabel('Values')
    plt.title('Comparison of R-squared and RMSE Values ')
    
    # Set the x-axis tick labels
    plt.xticks([r + barWidth/2 for r in range(len(comparison_frame))], comparison_frame['Model'], rotation=45)
    
    # Create the legend
    plt.legend()
    
    # Show the plot
    plt.show()
    
    In [90]:
    # Exclude OLS (both R2 and Adjusted R2) and Lasso from graphic as they are throwing off the visualization and are not good contenders for the model
    

    Observations: _

    We completly excluded 3 of the 15 models from the analysis since the results were pretty bad (OLS (both R2 and Adjusted R2) and Lasso). Almost all other the models had similar R2 or adjusted R2 values. The variation occured in the RMSE values. Our criteria was to endsure R2 values were closest to 1, and the RMSE values were close.

    Conclusions¶

    Note: You can also try some other algorithms such as KNN and compare the model performance with the existing ones.

    Insights¶

    REFINED INSIGHTS::

    • What are the most meaningful insights from the data relevant to the problem?

    Based on the data, the most common factors affecting the price of a used car are Power, Year, Engine and Mileage. It is particularly interesting that Power appeared as the most important factor, except for in SKLearn linear regression. Based on previous domain knowledge, I would expect Year to be most significant, followed by Year.

    COMPARISON OF TECHNIQUES AND THEIR RELATIVE PERFORMANCE:

    • How do different techniques perform? Which one is performing relatively better? Is there scope to improve the performance further?
    1. For Milestone 2, I created the following models:
    2. Linear Regression (SKLearn)
    3. Linear Regression (Statsmodel) using R2
    4. Linear Regression (Statsmodel) using Adjusted R2
    5. Decision Tree
    6. Decision Tree Tuned
    7. Ridge
    8. Lasso
    9. Random Forest
    10. Random Forest Tuned
    11. KNN
    12. XGBoost
    13. XGBoost Tuned
    14. ADA
    15. Gradient

    R-squared is a measure of how well the model fits the data, with a value of 1 indicating a perfect fit. A higher R-squared value indicates that the model is a better fit for the data. RMSE measures the difference between the predicted and actual values and a lower value indicates a better fit. The following is an ordered ranking from worst to best of the models:

    1. OLS R2, OLS R2 Adjusted and Lasso dropped from the comparison as they are not good contenders for the model.
    2. Linear Regression, has good Train_r2 score but mediocre Test_r2 and Test_RMSE scores.
    3. Decision Tree, has very high Train_r2 score and low Train_RMSE but low Test_r2 and high Test_RMSE, indicating overfitting.
    4. Ridge, has a good balance between Train_r2 and Test_r2 scores and relatively low Test_RMSE.
    5. Tuned Decision Tree, has lower Train_r2 and Test_r2 scores and higher Test_RMSE compared to other models.
    6. Tuned Random Forest, has similar performance as Ridge.
    7. KNN, has low Train_r2 and Test_r2 scores and high Test_RMSE.
    8. ADABoost, has low Train_r2 and Test_r2 scores and high Test_RMSE, indicating poor performance.
    9. GradientBoost, has relatively good Train_r2 and Test_r2 scores and low Test_RMSE.
    10. XGBoost is the best model to present for a used car price prediction task.

    SCOPE FOR IMPROVEMENT:

    Based on the given test R2 and test RMSE values for the XGBoost model, it appears that there is some scope for improvement. The XGBoost model has a test R2 of 0.903678 and a test RMSE of 3.290909. These metrics indicate that the model is not perfectly capturing the target variable, and there may be some room for improvement.

    FURTHER IMPROVEMENT:

    Feature selection: I could look at other methods and perform Recursive Feature Elimination to remove features that have a low impact on the model and can improve its performance and reduce overfitting. The algorithm works by removing the least important features based on the weights or coefficients of the model, and the process can be repeated until a desired number of features is reached.

    Data augmentation: I could generate more data by using techniques such as random rotations, shifts, and flips to increase the size of my dataset and reduce overfitting.

    Feature Engineering: I could look at improving the quality and relevance of the features. This could involve creating new features from existing ones, transforming features, or removing irrelevant or redundant features. For example, we could elminate Owner, which does not appear as a high runner feature.

    Further hyperparameter tuning: XGBoost has several hyperparameters that can be tuned to improve its performance. Some of these include tuning the learning rate, number of trees, maximum depth of trees, and others to find the optimal values for the dataset. I tried a number of combinations, and they resulted in similar values as the non-tuned model. Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations to determine the one that results in the best performance.

    PROPOSAL FOR FINAL SOLUTION DESIGN:

    • What model do you propose to be adopted? Why is this the best solution to adopt?

    FINAL SOLUTION DESIGN

    About XGBoost:

    XGBoost (eXtreme Gradient Boosting) is an open-source software library that provides a fast and efficient implementation of gradient boosting for machine learning. XGBoost, it is an ensemble method based on the gradient boosting algorithm, which is an iterative optimization process that adjusts the weights of the weak models so that the combined predictions minimize a loss function, such as mean squared error for regression. By training a sequence of weak models, where each subsequent model aims to correct the errors of the previous model, the final prediction is made by combining the predictions of all the individual models. In XGBoost, the individual models are decision trees which are created in sequential form. Weights are assigned to all the independent variables which are then fed into the decision tree which predicts results. The weight of variables predicted wrong by the tree is increased and these variables are then fed to the second decision tree. By increasing the weight of misclassified instances, subsequent trees in the ensemble put more emphasis on correctly classifying these instances, which helps to improve the overall accuracy of the model. This is a key aspect of the boosting technique used in XGBoost. These individual classifiers/predictors then ensemble to give a strong and more precise model. More generally, it can work on regression, classification, ranking, and user-defined prediction problems.
    https://www.geeksforgeeks.org/xgboost/

    Recommendation:

    We conducted exploratory data analytics (EDA) analysis on the used_car dataset. We then created 13 linear regression models and evaluated their success outcomes (R2 and RMSE values). Our investigation revealed that XGBoost is the best way to predict the price of a used car based on the Data Dictionary provided for the following reasons:

    Handling of numerical and categorical features: XGBoost handled both numerical and categorical features, making it well-suited for this problem as it handled the numerical features such as "Year", "Kilometers_driven", "Mileage", "Engine", "Power", and "New_Price", as well as categorical features such as "Location", "Fuel_Type", "Transmission", "Owner", and "Seats".

    Model interpretability: XGBoost provided built-in feature importance scores and visualization tools, which helped to understand the relative importance of different features in determining the price of a used car. This made it easier to understand and communicate the results of my model, as well as to identify potential areas for improvement.

    Non-linear relationships: XGBoost handled non-linear relationships between features and target outcomes, which is important as the relationship between features such as "Kilometers_driven", "Engine", and "Power" and the target "Price" are likely to be non-linear.

    Handling of missing values: XGBoost can handle missing values, which may be present in the dataset, making it a versatile tool for regression tasks. While we handled missing values in Milestone 1, it is useful to know that this is the case for future data analysis.

    Performance: XGBoost is known for its fast training speed and high prediction , which makes it well-suited for large datasets. Since I am not familier with how long it should take to run, I am citing this as a general advantage of XGBoost.

    Scalability: From the literature, one of the key strengths of XGBoost is its scalability. It can handle datasets with millions of examples and thousands of features, making it a popular choice for working with big data. Additionally, XGBoost has a number of advanced features that make it a highly customizable and flexible tool, such as support for parallel processing, tree pruning, and weighting of examples. Scalability may become important as more data becomes available, and as the company correlates different databases to derive intelligent insights other than pricing. For example, Cars4U can use the existing data and add vehicle service contract information to help determine the correct price of the extended warranty.

    Handling complexity: Ensemble methods are particularly useful when dealing with complex, non-linear relationships between features and target outcomes. By combining the predictions of multiple models, ensemble methods can often achieve higher accuracy and better generalization performance than individual models. Additionally, ensemble methods can also help to mitigate overfitting, which can be a problem when training models on large, complex datasets. Again, this will become even more important if the business want to use other databases to provide additional value and insights to their customers.

    Ease of use: XGBoost automates the process of training individual decision trees and combining their predictions, so that you don't have to worry about manual tuning of model parameters or worry about overfitting. XGBoost also provides a number of hyperparameters that you can tune to control the size and complexity of the individual decision trees, as well as the number of trees and the learning rate used in the optimization process.

    Open source: There are several benefits to using open source machine learning models. These advantages are true for all the models used in this milestone. We include them here for completeness as we are asked to substantiate our recommendation for XGBoost.

    Cost Effective: Open source machine learning models are free to use, which can save a significant amount of money compared to proprietary software. a) Customizable: Since the source code is publicly available, users have the flexibility to modify and tweak the model to better suit their specific use case. b) Large Community: Open source machine learning models often have a large and active community of contributors, which can result in regular updates and bug fixes. c) Better Integration: Open source machine learning models can be easily integrated into other open source tools and technologies, leading to a more streamlined workflow. d) Transparency: The transparency of open source machine learning models allows users to understand how the model works, which can help build trust in the model's predictions.

    Overall, XGBoost can be a good choice for determining the price of a used car based on the above data dictionary as it can handle a variety of data types, provide insights into the relative importance of features, handle non-linear relationships, and achieve high prediction accuracy.

    In [ ]: